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P. Se and violence are more than 
ever becoming major problems in the 
United States. The names of Los Angeles, 
_ Rochester, and St. Augustine have joined 
T Bunker Hill, Gettysburg, and the Little 
. Big Horn as American battlegrounds. There 
is also concern regarding individual vio- 
lence. In a single week a national magazine 
reported the cases of two 22-year-old. boys, 
one (a “gentle, easy-going, good natured” 
young man) who 5 days after graduation 
killed three unarmed victims during a bank 
robbery and the other (a “mild and loving” 
` person) who shot his twin brother (News- 
» week, 1965). 
|. When we try to apply information 
` gleaned from empirical studies of aggres- 
| sion to events such as these we find a great 
= gap between the aggression described in 
< our journals and that described in our news- 
i 
n » 1 Based on a doctoral dissertation submitted to 
i ad the Department of Psychology at the University 
^' ef California, Berkeley. The author wishes to ex- 
press his appreciation to the members of his 
doctoral committee, Hubert Coffey and lrving 
Piliavin, and, most especially, to the chairman, 
Gerald A. Mendelsohn, for assistance in the design, 
execution, and interpretation of the study. He 
also wishes to thank Lorenzo S. Buckley, Chief 
Probation Officer of Alameda County, California, 
and the staffs of the Guidance Clinic and of 
Juvenile Hall who collected and transcribed the 
! data. Finally, he wishes to thank the University of 
California Computer Center for donating time on 


. he IBM 7090. 
2Now at the University of Texas. 
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"UNDERCONTROLLED AND OVERCONTROLLED PERSONALITY 
TYPES IN EXTREME ANTISOCIAL AGGRESSION’ 


EDWIN I. MEGARGEE* 
University of California, Berkeley 


Physical aggression is typically attributed to inadequate control. While this 
is the pattern in 1 type of physically aggressive person, it is proposed that in 
another type, the Chronically Overcontrolled, rigid inhibitions against overt 
aggressive behavior will be found. Aggression by such people is apt to be of 
murderous intensity as aggressive impulse must build up to higher levels to 
overcome such inhibitions and since alternative means of expressing aggres- 
sion have not been learned. This suggests that in comparison with other 
criminal groups, a murderously assaultive group will be assessed as less 
hostile, less aggressive, and more controlled. An empirical study of 4 groups 
of assaultive and nonviolent delinquents supports this prediction. Implica- 
tions of this finding for practice and theory are discussed, 


papers. Most empirical data have been 
collected either in the laboratory under 
controlled conditions or in the schoolyard 
using the method of naturalistic observa- 
tion. In either case, the amount of extreme 
aggression that can take place is seriously 
curtailed, either by the experimenter’s 
ethics or by the intervention of school 
personnel. For this reason most of our data 
concern relatively mild forms of aggression 
and the psychologist must extrapolate to 
account for more extreme aggression such 
as assault or homicide. 

The general formulation that has emerged 
from empirical studies of relatively mild 
aggression is that the overtly aggressive 
person has fewer controls and more need or 
instigation for aggression than does the 
overtly nonaggressive person. 

The practical implications of this are 
clear: the way to discourage a person from 
acting aggressively is to build up his con- 
trols. Our prisons and reformatories typi- 
cally base their programs upon this princi- 
ple by instituting rewards for control and ` 
punishments for aggression. When an indi- 
vidual has demonstrated his controls by 
behaving in a nonaggressive fashion for a 
sufficiently long period, he is considered to 
be rehabilitated and is considered for re- 
lease. 

However, there is reason to believe the 
dynamics underlying an extremely assaul- 
tive offense such as homicide may be quite 
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different from the dynamies found in milder 
aggressive behavior? In case after case the 
extremely assaultive offender proves to be 
a rather passive person with no previous 
history of aggression. In Phoenix an 11- 
year-old boy who stabbed his brother 34 
times with a steak knife was described by 
all who knew him as being extremely 
polite and soft spoken with no history of 
assaultive behavior. In New York an 18- 
year-old youth who confessed he had as- 
saulted and strangled a 7-year-old girl in a 
Queens church and later tried to burn her 
body in the furnace was described in the 
press as an unemotional person who planned 
to be a minister. A 21-year-old man from 
Colorado who was accused of the rape and 
murder of two little girls had never been a 
discipline problem and, in fact, his step- 
father reported, ^When he was in school 
the other kids would run all over him and 
he'd never fight back. There is just no 
violence in him." In these cases the homi- 
cide was not just one more aggressive 
offense in a person who had always dis- 
played inadequate controls, but rather a 
completely uncharacteristic act in a person 
who had always displayed extraordinarily 
high levels of control. 

There are empirical as well as anecdotal 
data which indicate that extreme and 
moderate aggressive behavior might be 
characterized by different dynamics. For 
instance, in a study of the MMPI in which 
hostility scale scores of assaultive and non- 
assaultive criminals were compared, Megar- 
gee and Mendelsohn (1962) found a pat- 
tern of reversals with the assaultive subjects 
being tested as having more control and 


* The writer tends to classify aggression as “ex- 
treme,” “moderate,” or “mild.” The term “extreme” 
is reserved for physical aggression of homicidal 
intensity; the term “moderate” is used to describe 
physical aggression less likely to kill or maim the 
victim and in which there is more adequate justifi- 
cation for the aggressive response; “mild” is a 
term reserved for most verbal aggression and for 
physical aggression which is not likely to seriously 
injure the victim. Most schoolyard seufHes, the 
majority of “fouls” in sporting events, and such 
laboratory procedures as administering shock fall 
into this category. More precise operational defini- 
tions of “moderate,” and “extreme” assault will be 
oven in the procedures section and in Appendix 


less hostility than the nonassaultive crimi- 
nals or normals. This led them to suggest 


that the extremely assaultive person is often a 
fairly mild-mannered, long-suffering individual who 
buries his resentment under rigid but brittle con- 
trols. Under certain circumstances he may lash out 
and release all his aggression in one, often dis- 
astrous, act. Afterwards he reverts to his usual 
overcontrolled defenses. Thus he may be more of 
a menace than the verbally aggressive “chip-on- 
the-shoulder” type who releases his aggression in 
small doses. 


This suggests the hypothesis that assaul- 
tive criminals can be divided into at least 
two quite distinct personality types: the 
Undercontrolled Aggressive type and the 
Chronically Overcontrolled type. 

The Undercontrolled Aggressive person 
corresponds to the typical conception of an 
aggressive personality found in the litera- 
ture. He is a person whose inhibitions 
against aggressive behavior are quite low. 
Consequently, he usually responds with | 
aggression whenever he is frustrated or pro- 
voked. Since inhibitions are specific to the 
situation, he will, occasionally, be in- 
hibited from expressing his aggression. For 
instance, he might not attack his mother 
or a judge even though they frustrate him. 
In such cases, however, the Undercontrolled 
Aggressive person will readily use the 
mechanism of displacement and find a sub- 
stitute target for his aggression or he may 
resort to the mechanism of response gen- 
eralization and make a less drastic response 
to the original frustrating agent. Because 
of his low level of inhibitions he is likely 
to be diagnosed as a sociopathic personal- 
ity, antisocial or dyssocial type. Hence his 
personality dynamics are likely to be simi- 
lar to those of many other people who 
have legal difficulties. 

The Chronically Overcontrolled type be- 
haves quite differently, however. His in- 
hibitions against the expression of aggres- 
sion are extremely rigid so he rarely, if 
ever, responds with aggression no matter 
how great the provocation. These inhibi- 
tions are not focused on a few specific 
targets, as was the case with the Under- 
controlled Aggressive type, but instead are 
quite general. He is, therefore, unable to 
make use of the mechanisms of displace- 
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ment or response generalization. The result 
is that through some form of temporal sum- 
mation, such as described by Dollard, 
Doob, Miller, Mowrer, and Sears (1939, 
p. 31), his instigation to aggression builds 
up over time. In some cases, the instigation 
to aggression summates to the point where 
it exceeds even his excessive defenses. If 
this occurs when there are sufficient cues to 
aggression in the environment, an aggres- 
sive act should result. 

Because the inhibitions are so excessive, 
it would appear that when the Chronically 
Overcontrolled person finally does commit 
an aggressive act, his instigation to ag- 
gression should typically be at a higher 
level than that of the Undercontrolled or 
Habitually Aggressive person, simply be- 
cause more instigation is needed to over- 
come such excessive inhibitions.* If we as- 
sume that the degree of violence of the 
aggressive act is proportional to the degree 
of instigation, this suggests a way that 
this typology might be empirically verified. 
It would follow that a group of people who 
have committed extremely aggressive acts 
such as homicide or assault with a deadly 
weapon would be likely to include some 
people of the Chronically Overcontrolled 
type and some of the Undercontrolled Ag- 
gressive type. A group of people who have 
engaged in moderately aggressive behavior, 
such as fistfights, should on the other hand, 
consist almost exclusively of the Under- 
controlled Aggressive type? On various in- 
dexes or measures of aggressiveness and 
control, then, the extremely assaultive 
group should appear less aggressive and 
more controlled as a group than would 
either the moderately aggressive group or 
a nonassaultive sample. If, on the other 
hand, the prevailing view is correct and all 
assaultive people are undercontrolled, then 
an extremely assaultive group should show 
the most aggression and the least control 
relative to other groups. 


‘This is not necessarily the case, of course. If 
the Undercontrolled Aggressive person has been 
severely frustrated or provoked, it is possible that 
his instigation level, too, will be high. But by and 
large it is likely that most provocations will not be 
that extreme. 


REVIEW OF THE LITERATURE 


No prior studies have systematically ex- 
amined this hypothesis. Nevertheless, there 
are data in the literature which are relevant 
to it. The first source of data comes from 
the literature on psychological tests. In an 
effort to validate various tests or scales of 
aggression, a typical procedure has been 
to administer the test to “nonaggressive” 
and “aggressive” groups and observe the 
difference, if any. When the aggressive 
group is mildly or moderately aggressive, 
we would expect it to show more aggression 
and less control than the nonaggressive 
group. If, however, the aggressive group 
has engaged in extreme or homicidal ag- 
gression, then the typology outlined above 
would predict a “reversal,” that is, a meas- 
urement of the extreme group as being less 
aggressive or hostile and more controlled 
than the contrast group. 

The majority of studies have been of the 
first type: that is, they have used mild or 
moderately aggressive people in their cri- 
terion group. When significant differences 
have been found, they show the aggressive 
group as having higher aggression or hostil- 
ity scores on the tests. Purcell (1956) gave 
the Thematic Apperception Test (TAT) 
to three groups of army trainees referred 
for psychiatric study and found the most 
aggressive group to be highest in need Ag- 
gression (n Agg). Young (1956) also 
found high n Agg scores among her sample 
of institutionalized delinquents, while Mus- 
sen and Naylor (1954) found a significant 
positive relation between the amount of 
overt aggression displayed in detention 
and the amount of n Agg in a sample of 
juvenile delinquents. 

Results are similar with the Rorschach. 
Studies of assaultive hospital patients 
have shown them to be higher on a num- 
ber of scales of hostile content (Finney, 
1954; Sommer & Sommer, 1958; Stor- 
ment & Finney, 1953; Towbin, 1959). In a 
group of convicts, Rader (1957) found a 
positive correlation between his Rorschach 
hostile-content scale and aggressive re- 
marks made in group therapy sessions, 
while Gorlow, Zimet, and Fine (1952) re- 
ported that delinquents scored higher than 
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nondelinquents on Elizur’s (1949) scale of 
hostility. 

Thus, studies of mild and moderately as- 
saultive psychiatric patients or delinquents 
indicate that they have fewer inhibitions 
against expressing aggression on projective 
tests than do nonaggressive groups. Among 
normal subjects also, the usual finding has 
been that the amount of aggression shown 
on the test varies directly with a number 
of criteria of overt aggression (Elizur, 1949; 
Lindzey & Tejessey, 1956; Murstein, 1956; 
Pattie, 1954; Walker, 1951). 

The first aspect of the typology, there- 
fore, appears to be well established: mildly 
aggressive and moderately aggressive sub- 
jects are relatively undercontrolled. This is, 
of course, in accord with the prevailing 
view. The critical question is whether 
extremely assaultive subjects are overcon- 
trolled. The data here are much less ade- 
quate, but a few studies have used sub- 
jects who might be classified as extremely 
assaultive. Stone (1953) administered the 
TAT and Rorschach to three groups of 
military prisoners who differed in aggres- 
siveness. The most aggressive group con- 
sisted of 31 men in prison for assaults or 
murders who had at least two prior 
offenses of this type. Stone got mixed re- 
sults, He found that on the TAT the most 
aggressive group manifested significantly 
more aggression than did the other two. 
On the Rorschach, however, the medium 
aggressive group had significantly more 
aggression than did the most aggressive 
group, which in turn was tested as signifi- 
cantly more aggressive than the least ag- 

gressive group. 

While it is difficult to explain these data 
fully, it is clear that they do not offer much 
support for the hypothesis that extremely 
assaultive people are often overcontrolled. 
However, it should be pointed out that the 
mixing of assault cases with murderers in 
the most aggressive group (with unknown 
proportions of each) would work against 
the hypothesis. More important, limiting 
the most aggressive group to men who had 
at least two prior assaultive offenses would 
almost certainly sereen out Overcontrolled 
Assaultive offenders and limit the group to 
the Undercontrolled Assaultive type. 


A study by Weinberg (1953) is more 
relevant to the present writer’s hypothesis. 
He also used three groups, but his test meas- 
ure of aggressiveness was the Rosenzweig 
Picture-Frustration (P-F) Study. Wein- 
berg’s first group consisted of 22 male 
prisoners in the Oregon State Penitentiary 
for felonious assault, assault with a deadly 
weapon, or assault with intent to kill. This 
sample can be considered extremely assaul- 
tive. His second group consisted of 27 non- 
assaultive prisoners who were confined for 
forgery. His third group consisted of 43 
normal, noninstitutionalized job applicants 
who were matched with the prisoners for 
occupation, age, and education. All subjects 
were told that the test was for research 
purposes only. 

Weinberg found that the extremely as- 
saultive group obtained Extrapunitiveness 
scores significantly below those of the 
forgers, who in turn scored significantly 
lower than the normals. In a personal com- 
munication to Weinberg, Rosenzweig sug- 
gested that perhaps the prisoners censored 
their responses, but Weinberg pointed out 
that the job applicants were also moti- 
vated to appear in a good light. It is clear, 
however, that prisoners can, and sometimes 
will, alter their P-F responses so as to 
appear less aggressive (Megargee, 1964). 
However, the fact that the extremely 
assaultive group was significantly lower 
than the nonaggressive group of prisoners, 
who were probably equally motivated to 
dissimulate, is consistent with the hypothe- 
sis that extremely assaultive prisoners as à 
group may be relatively overcontrolled. 

A study by Megargee and Mendelsohn 
(1963) set out to test this hypothesis di- 
rectly. Three groups of criminals who were 
candidates for probation were compared on 
an index based on Murstein’s (1956) 
Rorschach Hostility Seale. The extremely 
assaultive group consisted of 21 men who 
had been convicted of murder, assault 
with a deadly weapon, voluntary man- 
slaughter or mayhem. The moderately 
assaultive group consisted of 21 men con- 
victed of battery. The nonviolent criminal 
group consisted of 27 men randomly se- 
lected from those convicted of nonaggres- 
sive crimes. It was predicted that the 
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moderately aggressive group would score 
highest on the Rorschach Hostility Index, 
the nonviolent group next, and that the 
extremely assaultive group would score 
lowest. 

As the data in Table 1 indicate, the 
trend of the data was in the predicted 
direction; however, the differences between 
the groups failed to reach acceptable levels 
of statistical significance when tested by 
analysis of variance. 

A more recent study by these same 
writers in which an assault scale for the 
MMPI (As-8r) was derived and cross- 
validated lends additional support to this 
hypothesis.® On the cross-validation of this 
scale, the extremely assaultive group (con- 
victed of murder, manslaughter, mayhem, 
or assault with a deadly weapon) scored 
significantly higher than moderately as- 
saultive, nonviolent criminal, or normal 
groups. Examination of the content of the 
scale (which had been derived from em- 
pirical item analyses) showed that the 
items were surprisingly passive and non- 
aggressive in nature. Moreover, examina- 
tion of the MMPI validity scales indicated 
that this was not the result of dissimula- 
tion, The As-3r scale was found to correlate 
positively with scales of repression, con- 
formity, and control and negatively with 
scales of hostility and acting-out such as 
Pd. In short, the pattern that emerged 
was consistent with what would be expected 
if the scale were detecting Chronically 
Overcontrolled people whose repressed 
hostility had broken through into be- 
havior (Megargee, 1965). 

The literature on psychological tests thus 
supports the notion that mildly or moder- 
ately aggressive people show more aggres- 
sion (and less inhibition of aggression) on 
psychological tests than do nonaggressive 
people. The literature on the test perform- 
ance of extremely aggressive subjects is 
much less conclusive; however, some find- 
ings were noted which were at least con- 


5Megargee, E. I., and Mendelsohn, G. A. The 
assessment of the chronically overcontrolled assaul- 
tive offender. Mimeographed manuscript, 1965. 
Available on request from the senior author, Psy- 
chology Department, University of Texas, Austin, 
"Texas 78712. 


TABLE 1 
RomscuacH HOSTILITY [INDEXES ror THREE 
Groups or CRIMINALS FROM MEGARGEE 
AND MENDELSOHN (1963) 


N Mean RHI SD 


Extreme assaultive 21 4.184 3.10 
Moderate assaultive 21 7.137 10.1 
Nonviolent criminals 27 0.197 5.31 


sistent with the motion that extremely 
assaultive people may be relatively over- 
controlled. 

A more fruitful, although less rigorous, 
source of data about extremely assaultive 
people comes from case studies reported in 
the literature. Demographie studies con- 
sistently show that a large proportion of 
persons convieted of homicide has no prior 
history of assaultive behavior (Berg & Fox, 
1947; Berkowitz, 1962; Wolfgang, 1957). 
Moreover, they are generally better be- 
haved while incarcerated and have lower 
recidivism rates after release than do most 
other groups of prisoners (Berkowitz, 1962, 
p. 318). This is, of course, the pattern we 
would expect if they were Chronieally 
Overcontrolled. 

Adolescent murderers have been found 
to come, for the most part, from adequate 
or superior homes and to have excellent 
reputations prior to the offense (Stearns, 
1957; Wickham, 1956). In his study of all 
teenage murderers referred to a court 
clinic, Wickham (1956) noted that most 
suffered from a lack of socially acceptable 
emotional outlets thereby building up ten- 
sions and pressures which resulted in a 
erime of violence. 

Schultz (1960) studied four probationers 
who had assaulted their wives with intent 
to kill. He found in general, “...a submis- 
sive, passive individual who avoided con- 
flict at all costs." He noted a pattern of ex- 
treme dependency with rigid control over 
aggressive impulses as long as the de- 
pendency was gratified. When the wife 
permanently withdrew this gratification by 
leaving or taking a lover, the control system 
broke down, and the murderous assault 
took place. 

Lamberti, Blackman, and Weiss (1958) 
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and Weiss, Lamberti, and Blackman 
(1960) studied a group of 13 people who, 
without any prior record of antisocial be- 
havior, suddenly committed a homicide. 
Their findings were in striking agreement 
with Schultz (1960). They found the 
mothers of these murderers had emphasized 
conformity to the rules of the social system. 
To gain affection the future murderers had 
had to deny or repress any hostility. Both 
clinically and on tests they appeared intro- 
verted, insecure, helpless, and unable to 
assert themselves. The authors concluded, 
“|. . [the patients’) difficulties came about 
because of their needs to conform and be- 
cause of their inability to act out hostility 
in ways which they would feel might still 
be socially acceptable [Weiss, Lamberti, & 
Blackman, 1960, p. 675]." 

Kahn (1959) compared murderers and 
burglars on a battery of tests and case 
history data and concluded that the 
murderers had been significantly more 
stable and conforming than the burglars. 
He found the murderers to have personali- 
ties which could permit breakthrough of 
ordinarily rigidly repressed sadistic hostil- 
ity and to have fewer personality resources 
for expression of feelings. 

These case studies of extremely assaul- 
tive subjects are thus consistent with the 
typology of assaultive offenders which has 
been suggested. However, a more systematic 
and rigorous study of the hypothesis is ob- 
viously called for. While the case studies 
cited above represent all those found in a 
fairly thorough, although by no means ex- 
haustive, search of the psychological litera- 
ture, sources of bias could easily enter. 
Clinicians are naturally much less prone to 
report cases which conform to the general 
expectation, and editors in turn are less 
likely to devote journal space to such 
studies. What is needed, obviously, is a 
study in which the subjects are selected 
without bias and in which systematic 
quantitative observations are made of 
several groups falling at different points 
along the continuum of aggressive be- 
havior. The study to be described was de- 
signed to meet this need. 


SUBJECTS AND GENERAL PROCEDURES 


In order to evaluate the hypothesis that ex- 
tremely assaultive subjects, as a group, will be 
measured as being low in aggression and high in 
impulse control, four groups of male juvenile 
delinquents were selected for study. In the first 
two groups were all 30 boys who had been detained 
for serious assaultive crimes in the Alameda 
County, California, Juvenile Hall during the 10- 
month period from July 1, 1962, to May 1, 1963. 
In June 1963, after data collection was completed, 
the probation officers’ reports of the offenses 
written for the Juvenile Court were collected and 
examined. The crimes were then rated for amount 
of aggressiveness on a 10-point Aggression scale 
devised by the writer. This scale took into account 
not only the behavior of the defendant, but also 
such variables as the degree of provocation, the 
subcultural setting, the immediate stimulus situa- 
tion, the relative size and armaments of victim and 
defendant, and the extent of injuries. (See Appen- 
dix A.) Ratings were made by the investigator, 
who had had 3 years experience working with 
delinquents, and another clinical psychologist with 
8 years of such experience. Preliminary ratings 
were made, serious discrepancies discussed, and the 
final ratings of aggressiveness made independently. 
Adequate reliability was achieved with a correla- 
tion of .94 between the two sets of final ratings. 
When discrepancies existed the final ratings for 
each subject were averaged. (See Appendix B for 
descriptions of the offenses and the final ratings.) 

The scale was then dichotomized and the nine 
subjects who had scored in the range from 6.0 to 
10 were operationally defined as the Extremely 
Assaultive (EA) group. This group included two 
cases of homicide, an attempted murder, five 
assaults with a deadly weapon, and one particularly 
brutal beating. The remaining 21 subjects, who had 
scored below 6.0, were defined as being Moderately 
Assaultive (MA). This group consisted primarily 
of cases of battery and gang fights. 

Since these two groups together comprised all 
the seriously assaultive delinquents apprehended 
during this 10-month period, the 30 assaultive sub- 
jects were regarded as a population and all such 
factors as race, age, intelligence, and so forth were 
left free to vary in accord with the principles of 
representative design. As it developed the EA sub- 
group tended to be somewhat younger, have fewer 
Negro, and more first offenders than the MA sub- 
group. (The latter relationship was, in fact, 
predicted.) 

In order to add generality to the study and 
test the hypothesis that EA offenders tend to be 
overcontrolled relative to other delinquents rather 


i ° Only those assaults in which the injury of the 
victim appeared to be the primary motive were 
included. Other assaults for other ends during 
which the victim may have been incidentally in- 
jured were excluded. For instance, no cases of 


forcible rape are inch d r 
Appendix B). included in the sample. (See 
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TABLE 2 
Supsect Composition or THE Four GROUPS 
Variable aa = Sum - 
MA 
N 9 26 20 21 
Age range 14-11 to 16-9 11-1 to 17-4 11-2 to 17-9 11-3 to 17-7 
T Age 14-5 15-5 15-3 15-4 
% Negro 44.5 57.7 60.0 66.7 
% First detention 77.7 23.0 35.0 28.6 
z IQ 93.8 91.8 97.3 97.0 
IQ range 73-125 67-107 64-140 71-147 


than relative only to MA delinquents, two con- 
trast groups of nonassaultive delinquents were also 
selected for study. They were matched for race, 
age, and recidivism rate with the total assaultive 
population (Groups EA and MA combined). The 
first contrast group (Group I) of 20 boys was 
selected from among those boys detained for 
Incorrigibility : unruliness, defiance, and unmanage- 
ability in the home. This group was included since 
it was felt that they were likely to be high on 
verbal aggressiveness. The second (Group PO) was 
selected from among those boys detained for 
property offenses such as auto theft or burglary. 
(See Table 2.) 

Neither of the two contrast groups included 
any boys who had known records for assaultive 
crimes. Boys known in advance to be mentally 
retarded (i.e., to have an IQ below 70) were also 
excluded from the study. 

Each of the 76 subjects was observed during the 
first 10 days of detention by the custodial staff 
(which was not informed of the hypotheses being 
tested) of the unit to which he was assigned. At 
the end of the third day of detention, each 
counselor filled out a behavior check list and a 
set of behavior rating scales.’ (See Appendixes C 
and D.) At the end of 10 days, a second behavior 
check list and set of rating scales was filled out in 
addition to the Gough Adjective Check List.* 

During this period, each boy was examined by 
a clinical psychologist (other than the investigator) 
from the Probation Department Guidance Clinic. 
The boy was not told that he was a research 
subject and was treated like any other diagnostic 
referral to the clinic with the exception that he 
was given a standardized interview and test battery 
and his responses were tape-recorded. As in the 


"In order to ensure maximum comparability 
with the Mussen and Naylor (1954) study, the 
behavior check list and rating scales as well as 
the directions to the raters, were exact duplicates 
of those used by them. The writer is grateful to 
H. Kelley Naylor for providing these forms as well 
as detailed instructions for scoring the TAT accord- 
ing to his system. 

*The writer is grateful to Harrison G. Gough 
for granting limited permission to duplicate his 
Adjective Check List for use in this study. 


case of any referral the boy was told that the 
psychological assessment was for the purpose of 
aiding the court in deciding on a disposition. This 
had the advantage of insuring that the results of 
the testing could be generalized to the routine 
clinical situation; it had the disadvantage of en- 
couraging the boys to present themselves in the 
most favorable light. 

The standardized interview was a condensation 
of that used by Bandura and Walters (1959) in 
their study of adolescent aggression, The questions 
used focused on the subject’s aggressive behavior 
toward teachers, parents, and peers. Included in 
the test battery were the California Psychological 
Inventory (CPI), the Rosenzweig P-F Study, the 
TAT, the Holtzman Inkblot Test (HIT), and a 
brief intelligence measure consisting of the In- 
formation and Picture Completion subscales of 
the Wechsler Intelligence Scale for Children 
(WISC) or the Wechsler Adult Intelligence Scale 
(WAIS)? 

Verbatim typescripts of the recorded interview, 
TAT, and HIT were prepared by a stenographer, 
who also removed identifying information from 
the test protocols and assigned each an identifica- 
tion number from a table of random numbers, The 
tests and interviews were then turned over to the 
writer for scoring. The Rosenzweig P-F Study, 
CPI, intelligence measure, and HIT were scored 
by standard procedures. In the case of the inter- 
views, ratings were made on the scales prepared 
by Bandura and Walters (1959), while for the 
TAT the scoring procedures used by Mussen and 
Naylor (1954) were adopted.” 

A final source of data was the Probation Officer’s 
report to the court which contained a social his- 


° The Information subscale was chosen because 
of all the verbal scales it has the highest correla- 
tion with the WAIS Full Scale IQ for 18-19 year 
olds and with the WISC Full Scale IQ for 13 year 
olds. The Picture Completion subtest was chosen 
because of all the performance scales it has the 
highest correlation with the WISC Full Scale IQ 
for 13 year olds (Wechsler, 1949, 1955). 

* Detailed descriptions of the instruments and 
scoring procedure will be found when each instru- 
ment is discussed individually below. 
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TABLE 3 
Summary OF VARIABLES AND HYPOTHESES 
Source No Variable name EAA EA PO +| EA<MA | EA» MA | FAI 
ion behavior 
Le aen 1| Incidence of first H-1 
offenders 
2| Good school at- H-2 H-3 
4 tendence 
Probation report 3 | Good school con- H-4 H-5 
duct 
4 | Incidence of soli- H-6 
tary offenses 
ior in detention 
pras 5| Total verbal ag- H-7 H-8 
1 3 ession 
Behavior check list 6 use al physical ag- H-9 H-10 
gression 
Rating scale 7 | Combined global H-11 H-12 
ratings 
Adjective check list] 8 | Overcontrol adjec- H-13 H-14 
tive index 
Psychological exami- 
nation 
9 | Reported physical H-15 H-16 
aggression 
i against peers 
Structured inter- | 10 | Reported physical | H-17 H-18 
aggression 
against authori- 
ties 
CPI 11 | Self-control H-19 H-20 
Rosenzweig P-F 12 | Extrapunitiveness H-21 H-22 
study 
TAT 13 | Need aggression H-23 H-24 
HIT 14 | Hostility H-25 H-26 
15 | Movement minus H-27 H-28 
color 


tory, a description of the offense and the individ- 
ual’s past criminal record." 

‘A total of 28 specific predictions were made 
concerning the various dependent variables. All 
tested aspects of the general hypothesis that the 
EA group would be lower on measures of aggres- 
siveness and higher on measures of control than 
the other groups in general and the MA group 
in particular. In the case of measures of verbal 
aggressiveness it was hypothesized that Group 


?: An effort was also made to see all the parents 
for structured interviews condensed from those 
used by Bandura and Walters (1959). However, 
personnel problems, lack of cooperation on the 
part of some parents, as well as technical difficulties 
with the recording equipment, so limited the num- 
ber of usable interviews that the procedure was 
dropped from the data analysis. 


EA would be lower than the Incorrigible (I) group 
in particular. Table 3 summarizes the various 
hypotheses. 

The statistical tests used varied as a function 
of sample size and level of measurement. For 
Hypotheses 1 through 6 only classificatory or 
nominal scale data were available so Fischer's 
Exact Probability Test was used for the small 
sample comparisons of the EA and MA groups 
while an adaptation of the binomial test was em- 
ployed for the larger sample comparisons of the 
EA group with the rest of the sample. For the 
remaining hypotheses ordinal scale measurement 
was attained so the Mann-Whitney U Test was 
employed (Siegel, 1956). For all but two of the 
hypotheses these tests made it possible to report 
the exact probability. Since directional predictions 


were made throughout, all the ps reported are one- 
tailed. 
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Predetention Behavior 


One difficulty with studies using criminal 
or delinquent subjects is that the commis- 
sion of the offense and the subsequent judi- 
cial procedures may change the person and 
influence the measures obtained. "Therefore, 
efforts were made to secure data about be- 
havior occurring prior to apprehension. 
Hypotheses were made about four aspects 
of behavior typically available for juvenile 
offenders: the number of prior detentions, 
the school attendance and conduct records, 
and whether the actual offense was com- 
mitted alone or as part of a group. 

Hypothesis 1. If the EA group contained 
a greater proportion of Chronically Over- 
controlled people, then it would be expected 
that this group as a whole would have ex- 
perienced fewer prior incarcerations in 
Juvenile Hall than the other delinquent 
groups. Since Groups I and PO had been 
selected to match the total assaultive popu- 
lation on this variable, no comparison 
with them could be made. However, 
recidivism played no part in the selection 
of Group MA. Accordingly, it was hy- 
pothesized that Group EA would have 
fewer subjects with prior detentions for 
any offense than would Group MA. The 
data in Table 4 support this contention. 
Only 22% of the Group EA boys had prior 
detentions, while over 70% of the Group 
MA boys had been previously confined, a 
difference significant at the .02 level. 

Hypotheses 2 and 8. It was predicted 
that the HA subjects would have better 
school attendance records than the other 
groups in general and Group MA in par- 
ticular. Fifty-six of the 76 court reports 


TABLE 4 
IncipeNce or First OFFENDERS FOR THE 
EXTREMELY ÁSSAULTIVE AND MODERATELY 
ASSAULTIVE GROUPS 


Extremely Moderately 


assaultive  assaultive Total 


First detention 7 6 13 

Recidivist 2 15 17 
Total 9 21 30 
Note.—p = .02. 


included information concerning school at- 
tendance records. Attendance was cate- 
gorized as Satisfactory or Unsatisfactory, 
Subjects who were not attending school 
due to suspension, exemption, or expulsion 
were not included in this analysis unless 
the nature of the attendance prior to the 
suspension was noted. 

In Table 5, the attendance records of the 
EA subjects are compared with those of the 
MA group and the rest of the sample. To 
test the hypothesis that the EA group 
would have better attendance than the MA 
group (Hypothesis 3), the Fisher Exact 
Probability Test was used (Siegel, 1956). 
This resulted in a p of .084. 

For the comparison of EA with the rest 
of the sample, including MA, an adaptation 
of the binomial test was employed. The 
proportion, P, of subjects with Satisfactory 
attendance records was calculated for the 
combined PO, I, and MA groups and found 
to be 34. The binomial test, corrected for 
continuity, was then used to determine the 
probability that the proportion of .84 
which was observed in the EA sample was 
a chance deviation within the same popu- 
lation represented by the other samples 
(Guilford, 1956, p. 175 ff.; Siegel, 1956, pp. 


TABLE 5 
Scoot ATTENDANCE RECORDS or THE Four GROUPS wir P VALUES OF THE TESTED DIFFERENCES 


? value of tested 


Group me comparisons 
ol ———————D 
EA PO I MA pO Pp 
Satisfactory 6 5 4 7 22 003 084 
Unsatisfactory 1 14 10 9 34 3 
Total 7 19 14 16 56 
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TABLE 6 
ScHoor-CoNpvor RECORDS or THE Four Groups WITH p VALUES OF THE TESTED DIFFERENCES 


P value of tested 


Group comparisons 
por EA PO, EA versus 
EA PO i MA Tand MA ' MA 
Satisfactory 3 4 2 8 17 .166 .958 
Unsatisfactory 3 14 11 12 40 
Total 6 18 13 20 57 


36-42). This test resulted in a z of 2.56 
which had a one-tailed p of .003. As Siegel 
(1956) has pointed out, this procedure is 
not usually recommended with samples as 
small as this; however, the p value is so 
small it is unlikely that we would commit 
a Type 1 error in rejecting the null hy- 
pothesis. 

Hypotheses 4 and 5. It was also hy- 
pothesized that EA. subjects would have 
better school-conduct records than the 
other subjects. Reports were available for 
57 subjects and, as in the case of attend- 
ance, the conduct reports were classified 
as Satisfactory and Unsatisfactory. Data 
were available on six of the EA group; 
50% had Satisfactory ratings as compared 
with 22% of the PO group, 15% of the I 
group, and 40% of the MA group. The 
difference between the EA and MA groups 
had a p of .258 when tested with the 
Fisher Exact Probability Test, while the 
difference between Group EA and the rest 
of the sample had a p of .166 when tested 
with the binomial test. 

Hypothesis 6. For the EA group, aggres- 
sion was assumed to be ego alien rather 
than ego syntonic. If so, the assault would 


TABLE 7 
INCIDENCE oF SOLITARY VERSUS GROUP 
OFFENDERS FOR EXTREME ASSAULTIVE 
AND MODERATE ÁSSAULTIVE GROUPS 


Extreme Moderate 


assaultive — assaultive Total 
Alone 6 4 10 
Group 3 17 20 
Total 9 21 30 


Note.—p = .021. 


be apt to be a furtive act committed while 
alone rather than a socially acceptable act 
committed while with others. Accordingly, 
it was hypothesized that the EA group 
would have a greater proportion of offend- 
ers in which the defendant and victim were 
alone at the time of the offense than would 
the MA group. In Table 7 the data are 
presented, 

The data show that while two thirds of 
the EA offenders were alone with their 
vietim, less than 2095 of the MA subjects 
were. The Fisher Exact Probability Test 
resulted in a p of .021. This finding is 
interpreted as indicating that physical ag- 
gression is more socially acceptable for the 
MA. subjects. [An alternative explanation 
would be that the MA subjects are simply 
more outgoing and friendly than the EA 
subjects. This hypothesis, however, is con- 
traindicated by the ratings scales to be de- 
scribed below (Hypotheses 11 and 12), on 
which it was found that, on the contrary, 
the EA subjects were rated as being sig- 
nificantly more cooperative (p = .01) ami- 
able (p = .051), and friendly (p = .02) 
than the MA subjects when their social 
interactions were observed during the first 
10 days of custody.] 

Summary of Predetention Data. Six pre- 
dictions were made regarding behavior oc- 
curring prior to any judicial action. All the 
relationships were in the predicted direc- 
tion, and three were highly significant 
while another attained marginal signifi- 
cance. These data indicate that even before 
coming into custody the EA boys behaved 


in a manner consistent with the notion that | 


they are overcontrolled and inhibited in the 
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expression of antisocial tendencies relative 
to other groups of delinquents.!? 


BEHAVIOR IN DETENTION 


While in detention awaiting court hear- 
ings, the subjects, like all other boys in 
Juvenile Hall, were assigned to one of four 
custodial units. Each unit contained ap- 
proximately 40 boys and was supervised 
from 7:30 a.m. to 11:30 p.m. by two sets 
of two counselors working 8-hour shifts. 
These men were with the boys constantly 
during the daylight hours and as a routine 
matter observed each boy’s behavior and 
interactions during recreation periods, 
sports activities, meals, and work assign- 
ments, For the boys included in the study, 
each counselor filled out a behavior check 
list and a set of rating scales on the third 
and tenth day of the boy’s detention and a 
Gough Adjective Check List on the tenth 
day. These first two instruments are dupli- 
cates of those used by Mussen and Naylor 
(1954) and are reproduced in Appendixes 
C and D. 

The behavior check list, originally de- 
vised by Naylor (1952), listed 13 categories 
of aggressive behavior (See Appendix C). 
Each counselor in the unit checked off each 
category of aggressive behavior he had ob- 
served each subject engage in. Because of 
days off, vacations, and sick leave, the 
counselor population was rather fluid, re- 
sulting in anywhere from 7 to 14 individuals 
coming in contact with each boy. The num- 
ber of reports received on each subject 
varied accordingly, so the number of re- 
ports listing a specific category of behavior 
for a subject was divided by the total num- 
bers of reports submitted on that subject 
and multiplied by 100 yielding a percentage 
score. Thus, if a boy had nine behavior 
check lists submitted on him, and three 
listed Physieal Attack, his score on this 
variable was 33.3. 


? These data might also be interpreted as reflect- 
ing extreme undercontrol by the MA group rather 
than overcontrol by the EA group. The Adjective 
Check List and Holtzman Inkblot Technique data 
reported below (Tables 8 and 9) contraindicate this 
interpretation however, as do the significant differ- 
ences found when the EA group was compared 
with the other three groups combined. 


Seven of the categories on the behavior 
check list (Bragging; Teasing; Saucy, Im- 
pertinent; Insulting, Name Calling; Ridi- 
culing, Mocking; Verbal Castigation; and 
Malicious Gossip) seemed to reflect verbal 
aggression so the percentage scores for these 
categories were added to give a score for 
Total Verbal Aggression. In like manner, 
the scores for five categories (Physical 
Attack, Threatening, Bullying, Destruc- 
tive, and Temper Tantrums) seemed to re- 
flect physical aggressiveness, and these 
were combined into a score for Total Physi- 
cal Aggression.!? 

Hypotheses 7 and 8. It was hypothesized 
that EA subjects would be lower than the 
combined contrast groups on Total Verbal 
Aggression (Hypothesis 7) and that the EA 
subjects in particular would be lower than 
the group (Hypothesis 8) which was ex- 
pected to be high in verbal aggressiveness, 
The data are presented in Table 8. As ex- 
pected, Group I had the most Verbal Ag- 
gressiveness and Group EA the least. 

When Hypothesis 7 was evaluated by 
means of the Mann-Whitney U Test, an 
exact probability of .058 was obtained.!4 
In the analysis of Hypothesis 8, the EA 
group was contrasted with Group I and 
the difference found to be significant with 
p < .05. 

Hypotheses 9 and 10. It was predicted 
that EA subjects would be the lowest on 
Total Physical Aggression on the behavior 
check list (Hypothesis 9) and, moreover, 


“The thirteenth category, used by Mussen and 
Naylor (1954), “Running Away,” did not appear 
to fit in either of these categories, and indeed 
appears to be fairly nonaggressive in nature. There- 
fore, it was not included in the analysis. 

“In this analysis, as in most of those to be 
reported, the EA subjects were contrasted with the 
other three groups combined. This was done be- 
cause, for these hypotheses, the writer was always 
predicting that Group EA would be higher or 
lower than the other groups; other differences ob- 
tained between the means of the other groups 
were consequently not relevant to the primary 
concern of the study. The reader may also have 
noted that the total number of subjects in this 
analysis is 75. Inevitably, with this many variables 
and sources of data, there was some missing in- 
formation for practically every variable. In this 
case, one boy was released by the court after his 
testing and interview were completed, but before 
observational data could be collected. 


| 
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TABLE 8 


MEAN SCORES or THE Four GROUPS ON MEASURES OF BEHAVIOR IN DETENTION WITH p VALUES OF THE 
TESTED DIFFERENCES 


Group scores p values 
i A EA EA 
dies Lb tbc: x ae, R YU EA wma 
and MA 
Behavior check list 
Total verbal aggres- M 74.41 113.44 137.43 103.80 .058 «.05 
sion SD 79.04 88.52 92.85 73.07 
N 9 26 19 21 
Total physical ag- M 27.04 32.78 44.17 51.02 874 255 
gression SD 25.84 50.26 49.23 58.31 
N 9 26 19 21 
Combined rating M 16.11 15.36 14.96 14.80 .06 .045 
scales SD 1.97 1.98 1.52 1.77 
N 9 26 20 21 
Adjective check list 
Overcontrol index M 2.42 0.56 —0.78 —0.57 .028 .055 
SD 3.07 3.98 3.29 4.15 
N 9 26 20 19 


that in particular Group EA would be 
lower than Group MA (Hypothesis 10). 
While the differences were in the expected 
direction, they were far from significant. 

It should be remembered, however, that 
these ratings were made in a custodial 
setting. Not only was swift punishment 
administered for any physical aggression, 
but also the boy knew that his behavior 
in detention would undoubtedly influence 
the court disposition. Such external controls 
would tend to reduce the amount of ag- 
gression engaged in by any subjects of 
the Undercontrolled Aggressive type and 
hence work against the hypothesis. 


Rating Scales 


Hypotheses 11 and 12. The second meas- 
ure of overt behavior during detention was 
the set of 5-point rating scales devised by 
Naylor (1952) for a study of juvenile de- 
linquents. The set consisted of five bipolar 
seales ranging from an unfavorable trait 
such as Uncooperative or Aggressive to a 
more passive or favorable one such as Co- 
operative or Submissive. Each scale was 
anchored at each point by a brief descrip- 
tion of the behavior which would earn such 
a rating. (See Appendix D.) 


These five rating scales were combined 
into a global scale with a possible range 
from 5 to 25. High scores reflected co- 
operativeness, amiability, submission, do- 
cility, and friendliness while low scores 
indicated an uncooperative, quarrelsome, 
aggressive, rebellious, and antagonistic at- 
titude. It was hypothesized that the EA 
group would have a higher score than the 
other groups combined (Hypothesis 11) 
and the MA group in particular (Hypothe- 
sis 12). This expectation was confirmed 
with the EA group having the highest score 
and the MA group the lowest. Hypothesis 
11 had a p of .06 while Hypothesis 12 had 
a p of .045. (See Table 8.) 15 


Gough Adjective Check List 


Hypotheses 18 and 14. On the tenth 
day of detention, each counselor checked 
all those adjectives on the 300-item 
Gough Adjective Check List which he felt 
were descriptive of the boy. Forty of the 
adjectives were selected for study. Twenty 
of these were adjectives which seemed de- 


?* Not only was the EA group rated most favor- 
ably on the combined scales, but it also had the | 
most favorable rating on each of the individual | 
scales as well. | 
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scriptive of the Chronically Overcontrolled 
person such as “meek,” “self-controlled,” 
“conscientious,” and “withdrawn.” Twenty 
others seemed descriptive of the Undercon- 
trolled Aggressive type, including such 
terms as “aggressive,” “hostile,” “irritable,” 
and “assertive.” (See Appendix E.) The 
adjective check list submitted by each 
counselor was scored by counting the num- 
ber of adjectives of each type. Then an 
Overcontrol index was created by sub- 
tracting the Undercontrolled Aggressive 
adjectives from the Chronically Overcon- 
trolled ones. It was hypothesized that the 
EA group would have the highest score on 
this index. 

The data were in the predicted direction 
with a p of .028 when Group EA was com- 
pared with the combined contrast groups 
and .055 when Group EA was compared 
with Group MA. (See Table 8.) 

The Overcontrol index was particularly 
noteworthy in another regard. The three 
contrast groups, MA, PO, and I, all had 
quite similar scores as would be expected 
if they shared essentially the same values 
and orientation. The EA group, on the 
other hand, had a score almost five times 
that of the next highest contrast group. 
This is consistent with the basic thesis that 
the EA offender category is apt to include 
a distinctly different type of person than 
other offenders, Secondly, this difference is 
shown by the Overcontrol index data to be 
clearly in the direction of excessive control 
on the part of the EA subject rather than 
extraordinary aggressiveness on the part of 
the MA subject. 


Summary of Behavior in Detention 


On all of the devices used to assess be- 
havior during detention, the EA boys were 
measured as being less aggressive and more 
controlled than were the members of the 
other three groups. The consistency of these 
results adds confidence to the reliability of 
these observations. 

A logical question is whether or not this 
reliability was the product of some set or 
unconscious bias among the observers. In 
this regard it is important to recall that 
the counselors who made the ratings had 
no idea what hypotheses were being tested, 


having been told merely that it was a study 
on aggressiveness. Knowing this and know- 
ing the offenses with which each boy was 
charged, one would expect, if anything, a 
set to rate the boys charged with extremely 
assaultive crimes as being more aggressive. 
If such a set did exist it would operate 
against the actual hypotheses, so that the 
differences obtained were in spite of such a 
set rather than because of it. 

Secondly, the obtained differences were 
also in spite of the fact that the boys were 
confined in a custodial setting in which 
swift sanctions were levied against any ag- 
gressive behavior. The effect of this would 
be to curb the aggressiveness of the Under- 
controlled Aggressive boys by providing ex- 
ternal controls in place of their deficient 
internal controls. It should have little 
effect on Chronieally Overcontrolled people. 
"Thus, the setting operated to reduce differ- 
ences, and it is likely that if the observa- 
tions could somehow have been made in 
the natural milieu the differences might 
have been even more striking. 


RESULTS OF THE PSYCHOLOGICAL 
ASSESSMENT 


Structured Interview Data 


In order to obtain information concern- 
ing the subjects' attitudes toward aggres- 
sion and their customary behavior “on the 
street” (i.e, when not in custody) each boy 
was interviewed. The structured interviews 
used by Bandura and Walters (1959) were 
selected for this purpose because reliable 
scales had been devised by those authors 
so that responses could be quantified. Time 
limitations precluded administering the en- 
tire schedule, so the interview consisted of 
the following questions: 1, 3, 4, 5, 6, 9, 10, 
13, 22, 31, 33, 34, and 38 (Bandura & 
Walters, 1959, Appendix B). For the most 
part these questions asked the subject how 
he behaved in various situations (i.e., “How 
do you deal with the kind of guy who likes 
to push his weight around?") or about the 
amount of aggressive behavior he had en- 
gaged in in the past (i.e., “How often have 
you gotten into a fight since you've been 
at high school?"). 

The interviews were scored on several of 
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the scales of physieal aggression devised by 
Bandura and Walters (1959): physical ag- 
gression against peers, against teachers, 
against mother, and against father. A single 
scale of "physieal aggression against au- 
thorities" was devised by computing the 
mean for the latter three scales. 

Hypotheses 15 through 18. It was hy- 
pothesized that the EA group would receive 
the lowest scores on these two scales of 
physieal aggression. This prediction was 
not upheld in the case of physical aggres- 
sion against peers (Hypotheses 15 and 16). 
On this variable the Property Offenders re- 
ported they had engaged in the least ag- 
gression; while the score for the EA group 
was somewhat lower than that for the MA 
group, the difference was far from signifi- 
cant. 

In the ease of reported aggression against 
authorities (Hypotheses 17 and 18) the EA 
group did receive the lowest score. The p 
value for the difference between the EA 
group and the other groups combined was 
.065, while the difference between the EA 
and MA groups had a p of .082. 


California Psychological Inventory 


It was impossible to administer the CPI 
to every subject since many were unable 
to read it adequately. In a few cases the 
test was read to the subject, but limita- 
tions on staff time precluded this as a 
standard operating procedure. After scor- 
ing, all CPIs on which the Communality 
score was less than 20 were discarded as 
inyalid on the basis of probable random 
answering (Gough, 1960, p. 20). This left 
a total of only 46 valid CPI protocols from 
the total sample of 76. The bias introduced 
by this loss cannot be determined; how- 
ever the proportion of usable profiles was 
quite similar for the four groups ranging 
from a low of 58% for the PO group to a 
high of 67% for the EA group. The probable 
effect was to eliminate the least coopera- 
tive and least intelligent subjects from each 
group. 

Since the scales on the CPI are such that 
higher scores reflect more positive traits, it 
was generally expected that the EA group 
would score highest on most of the scales. 


This expectation was upheld, since the EA 
group had the highest mean score on 13 of 
the 18 scales.!* 

Hypotheses 19 and 20. It was hypothe- 
sized that the EA group would be highest 
on the Self Control (Sc) scale and this was 
found to be the case. The significance of 
the difference between Group EA and the 
other groups had a p of .129. The exact 
probability of the difference between 
Groups EA and MA could not be deter- 
mined because of the small number of sub- 
jects involved, but the Mann-Whitney U 
value of 30.5 which was obtained was far 
above the value of 19 required for signifi- 
cance at the .05 level. 

It is noteworthy that the mean Sc score 
of 26.5 obtained by the EA subjects was 
somewhat above the usual high school 
norms. This is what would be expected if 
the EA group contained some subjects who 
were overcontrolled and not just more con- 
trolled than the other delinquents. 

Other CPI Relationships. In examining 
the means on the other scales, it was noted 
that the EA group scored markedly higher 
on the Responsibility (Re), Well Being 
(Wb), Tolerance (To), Achievement by 
Independenee (Ai), Intellectual Efficiency 
(Ie), and Flexibility (Fx) scales. If these 
differences had been anticipated and direc- 
tional predictions had been made, some of 
them would have been significant. The one- 
tailed p for the difference between Group 
EA and the other groups on the Re scale, 
if it had been predicted, would have been 
12, for the Wb scale .01, for the To scale 
.07, for the Ai seale .07, for the Ie scale .01, 
and for the Fx scale .06. 

This score pattern indicates that the 
members of the EA group tend to be more 
conscientious, responsible, and alert to 
ethical or moral issues than the members 
of the other groups (Re). They are particu- 
larly oriented toward doing well in school 
and tend to be more mature and thorough 
in their approach to academic tasks (Ai 
and Ie). They tend to be more alert, am- 
bitious, and enterprising and are more 
likely to value work and effort for their 

* Because of lack of independence among the 


CPI scales, no statistical test of this prediction was 
possible. 
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own sakes (Wb). They appear to be more 
verbal, tolerant, and clear thinking (To), 
although they can be sarcastic and cynical 
in their verbal behavior (Fx), and, of 
course, as predicted, they are less impulsive 
and more controlled (Sc). All in all, they 
tend to have more of the traits which are 
valued among the middle-class members of 
our society. This pattern is consistent with 
the personality pattern hypothesized for 
the Overcontrolled type as opposed to the 
Undercontrolled type. 


Rosenzweig Picture-Frustration Study 


The Rosenzweig P-F Study has been used 
in a number of studies of aggression (e.g., 
Weinberg, 1953) and was therefore in- 
cluded in the present investigation. Since 
it is a relatively unsubtle instrument when 
administered in the context of a court clinic, 
there were some reservations about whether 
or not the results might be influenced by 
dissimulation on the part of the subjects. 

Hypotheses 21 and 22. It was hypothe- 
sized that the EA group would be the low- 
est of the four on the Extrapunitiveness 
scale of the P-F. This did not prove to be 
the case. The EA group did have a lower 
score than the MA group as had been pre- 
dicted (Hypothesis 22), but the high p 
value indicated that this was only a chance 
relation. 

In order to determine the validity of 
these results, the mean scores of the four 
groups were compared with the normative 
scores for boys aged 14-19 reported by 
Deming (1960). It was found that the 
mean scores for all four of the delinquent 
groups fell below the mean score of 46.4 
reported by Deming. This suggested that 
dissimulation was influencing the results. 

In order to investigate this notion fur- 
ther, the Extrapunitiveness scores for the 
46 boys for whom valid CPIs were avail- 
able were correlated with the CPI Good 
Impression scale, which was designed to 
detect “faking good.” A correlation of 
—.46 was obtained which was highly sig- 
nificant (p < .01). These data would indi- 
cate that in a court setting such as this, 
the P-F results can be markedly influ- 
enced by defensiveness (Megargee, 1964). 
It is questionable, therefore, how ade- 


quately the P-F tested the basic hypothe- 
ses. 


Thematic Apperception Test 


Hypotheses 28 and 24. The TAT was 
administered in the normal fashion with 
the exception that the stories were tape- 
recorded rather than written down by the 
examiner. An effort was made to employ 
the same methods used by Mussen and 
Naylor (1954) in their study of the rela- 
tion between fantasy and overt aggression. 
The cards used (1, 3BM, 4, 6BM, 7BM, 
8BM, 12M, 13B, 14, and 18BM) were the 
same, and the cards were scored for n Agg 
in the same manner using verbatim type- 
scripts prepared from the tape recordings. 

The hypothesis that the EA group 
would be lowest on n Agg was not sup- 
ported by the data. In faet, Group MA 
proved to have the lowest scores rather 
than Group EA. 


Holtzman Inkblot Test 


The Holtzman Inkblot Test (HIT) was 
administered to all the subjects using 
standard procedure. All responses were 
tape-recorded, and scoring was done by 
the investigator from verbatim typescripts. 
The HIT was selected rather than the 
Rorschach because its scoring procedure is 
more amenable to statistical manipulation 
and because the use of 45 cards with one 
response to each was apt to elicit a larger 
body of responses. Both content and de- 
terminant scores are obtainable from the 
test, and both were used, 

Hypotheses 25 and 26. It was hypothe- 
sized that the EA group would be lowest 
on the Hostility scale (Hs) and that 
Group EA would be significantly lower 
than Group MA. This is a content scale 
based on the one devised by Murstein 
(1956) for the Rorschach. The data in 
Table 9 show that on the contrary, the EA 
group had the highest Hs score, so both 
hypotheses were disconfirmed. (The sig- 
nificance of this reversal was tested and 
found to be insignificant, with a two- 
tailed p of .267.) 

Hypotheses 27 and 28. The last hypothe- 
ses to be tested were related to the determi- 
nants of Movement and Color. In the 
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Klopfer system, Movement responses are 
interpreted as indicating "an inner system 
of conscious values of one kind or another, 
in terms of which the person tends to con- 
trol his behavior, to guide his satisfactions, 
and to postpone his gratifications [Klopfer, 
Ainsworth, Klopfer, & Holt, 1954, p. 262]." 
If Movement responses on the HIT have a 
similar meaning, it would be expected that 
the EA group would be highest on this vari- 
able. 

The use of color, on the other hand, is 
associated with immature and impulsive 
behavior, particularly when the color is 
used as a primary determinant with little 
attention paid to the form elements of the 
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blot. Sinee the HIT Color score assigns the 
heaviest scoring weights to these uncon- 
trolled color responses (Holtzman, 1961) it 
would be expected that the EA group 
would have the lowest Color scores. 

Since the Overcontrolled subjects should 
be relatively high on Movement and low on 
Color a simple index was used in which 
each subject’s Color score was subtracted 
from his Movement score. It was predicted 
that the EA group would have the highest 
score on this Movement-Color index. 

This prediction was borne out by the 
data. The Movement-Color score of the 
EA group was markedly higher than those 
of the other three groups. The p value for 


TABLE 9 
Resutts or Data COLLECTED DURING THE PSYCHOLOGICAL ASSESSMENT 
Variable Group scores value of tested comparisons 
Instrument Variable name EA | ro | 1 | ma | PAversuskO, | EA versus MA 
Reported physical ag- | M — 3.39| 3.11) 3.45) 3.50) Not tested* .919 
gression against SD  1.24| 1.32| 1.31| 1.16 
Structured inter-| Peers N 9" | 28). 207 [720 
EN Reported physical ag- | M — 1.08| 1.20| 1.34| 1.19 .005 .082 
gression against au- | SD +22) .20| .36| .30 
thorities N 9 |23 | 20 | 20 
CPI Self-control M 26.5 22.5 |21.8 23.4 .129 2 .05^ 
SD 6.13) 6.42) 6.6810.14 
N 6 15 | 12 | 13 
Rosenzweig P-F | Extrapunitiveness M 40.6 |39.71/35.4642.24| Not tested" .48 
study SD  17.1313.91]15.61/19.75 
N 9 26 | 20 | 18 
TAT Need aggression M  8.33| 8.52|10.05| 8.10| Not tested" | Not tested* 
SD 2.79) 3.50) 5.08) 4.52 
N 9 | 23 | 20 | 19 
Hostility M 10.33} 6.84) 8.80) 7.86 - 267° Not tested* 
SD 6.00} 5.02) 6.10) 5.37) 
N 9 26 | 19 | 21 
HIT Movement-color index | M — 14.00| 8.08) 5.50| 8.18 .061 .059 
SD  8.1914.91/23.99/12.94 
N 9 26 | 19 | 21 
Number of pure color | M 1.11| 1.76| 2.85| 1.86 -111 .045 
responses SD  1.59| 1.97| 5.97, 1.46) 
N 9 |26 | 19 | 21 | 
a Data not in hypothesized direction. 
b Exact probability test not possible because of small N’s. 


* Two-tailed test used to test reversal. 
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the difference between the EA group and 
the rest of the sample combined was .061, 
while the p value for the comparison be- 
tween the EA and MA groups was .059. 
This pattern of scores, which resembles 
that found for the Adjective Check List 
Overcontrol index, lends further support to 
the notion that while the behavior and 
dynamics of the MA group are similar to 
those of other offenders those of the EA 
group are qualitatively different as would 
be the case if a different personality type 
was involved. 

In order to determine whether the rela- 
tively greater use of Color by the other 
three groups was the result of the use of 
uncontrolled color responses as opposed to a 
large number of well-controlled color re- 
sponses, the incidence of pure color re- 
sponses in all four groups was calculated. 
The data in Table 9 indicate that, as ex- 
pected, Group EA had the lowest number 
of pure color responses, The p of the differ- 
ence between the EA group and the rest of 
the sample was .111, while the p of the 
difference between the EA and MA groups 
was .045. 


Summary of the Results of the Psychologi- 
cal Assessment 


The results of the psychological assess- 
ment are not as clear-cut as were those of 
the preoffense behavior or the behavior 
while in detention. No support was ob- 
tained from the Rosenzweig P-F Study, 
the TAT n Agg measure and the HIT Hos- 
tility scale. The interviews indicated that 
the EA group was less aggressive to au- 
thorities but the differences in the amount 
of aggression against peers reported were 
not significant. The CPI data were in the 
predieted direction but with marginal p 
values. The best support came from the 
Movement-Color index on the HIT, on 
which the EA group displayed substantially 
more impulse control. 

It is not too surprising that the behavior 
on the psychological tests is not as clear-cut 
as that observed in detention or found in 
the case history. Studies such as those of 
Kostlan (1954) and Little and Shneidman 
(1959) have demonstrated the greater 
validity of case history data as opposed to 


psychological tests. This tendency to find 
greater clarity in direct measures as op- 
posed to tests would probably be accentu- 
ated in a correctional setting such as the 
one in which these data were collected. A 
delinquent being assessed to aid in deter- 
mining a court disposition is naturally 
going to be quite guarded and defensive 
during a psychological examination, but is 
less likely to be able to maintain a sub- 
terfuge over 10 days of interaction with 
other delinquents. 

Within the psychological test data, the 
more obvious the instrument, the more 
likely it is that a defensive attitude could 
alter the results. This is consistent with 
the fact that the most obvious tests, the 
P-F study, the TAT, and the Hostility 
scale of the HIT failed to show the pre- 
dieted patterns, but the less easily dis- 
torted measures such as the empirically 
derived CPI Self-Control scale and the 
Movement-Color index of the inkblot test 
did show the hypothesized patterns. 


Discussion 


The first issue to be discussed is whether 
or not the data presented above support 
the writer’s hypotheses. It will be recalled 
that the basie hypothesis was that there 
are two personality types involved in anti- 
social aggression: the Undercontrolled Ag- 
gressive type and the Chronically Over- 
controlled type. The former may commit 
aggressive responses of any intensity de- 
pending upon the immediate stimulus situa- 
tion, while the latter tends to inhibit ag- 
gressive responses until they break through 
in what the writer has called an “extremely 
assaultive response” in which the very life 
of the victim may be jeopardized. It fol- 
lowed from this hypothesis that a group of 
extremely assaultive subjects would be as- 
sessed as less aggressive and more con- 
trolled, as a group, than would contrast 
groups of moderately assaultive and other 
nonassaultive delinquents because of the 
probable presence of Overcontrolled sub- 
jects among the extremely  assaultive 
group, while the latter groups would be 
made up of the Undercontrolled type. 

In the study which was conducted to 
test this, the results were by no means un- 
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equivocal in their support for the hypothe- 
sis. Nevertheless, by and large, a review of 
the data indicates consistent if not spectac- 
ular support for the writer’s hypotheses. 

Of the 28 hypotheses, 22 were in the pre- 
dicted direction with the EA group display- 
ing less aggression or more control than 
the other groups in general and the MA 
group in particular. Fourteen of these hy- 
potheses received some measure of statisti- 
cal support, with p values ranging from 
003 to .084. On only 1 of the 15 variables 
was the EA group assessed as having the 
most hostility. This difference when tested 
did not approach significance. 

Because of the somewhat marginal sig- 
nifieance levels, we cannot consider the 
case firmly established or definitely proven. 
However, the total pattern of the data sup- 
ports the proposed typology and certainly 
gives no support to the most prevalent 
opposing notion that all extremely assaul- 
tive delinquents are more aggressive and 
less controlled than other delinquents. 

However, as in the case of any experi- 
ment, it is possible to advance other ad hoc 
explanations. For instance, one might argue 
that the EA group, facing severer penalties 
for their offenses, behaved in a more con- 
trolled fashion during detention in order 
to impress the court and receive lesser 
penalties." This hypothesis, however, is 
not supported by the data. If this were 
the ease, then the EA group would show a 
pattern of greater control after the offense 
but not before. However, the predetention 
measures also showed significantly more 
social conformity on the part of the EA 


"The assumption that the EA subjects faced 
severer penalties at the hands of the Juvenile 
Court is itself false. The dispositions for juvenile 
offenders can be dichotomized as (a) some form 
of institutionalization for an indefinite period or 
(b) probation in the community. Of the EA 
sample, 67% were institutionalized, of the MA 
sample 63%, of the PO sample 60%, and of the I 
sample 40%. Thus, while it appears that the In- 
corrigible subjects were less likely to be institu- 
tionalized than were the others, it does not appear 
that the penalties assigned the EA group were 
much greater than those meted the rest of the 
sample. Nevertheless, while the assumption of 
greater penalties is false, the delinquents them- 
selves could have acted as if it, were true. 


group in the form of a lower recidivism 
rate and a better school attendance record. 

It is also unlikely that temporary situa- 
tional constraint could cause significant 
differences on the Movement-Color index of 
the HIT. Moreover, undue concern over 
making a good impression, or even out- 
right dissimulation, probably would be re- 
flected in the CPI Good Impression scale. 
Yet the EA group’s mean score on this scale 
was almost identical to that obtained by 
Gough’s (1960) normative sample of 3,572 
high school males, It would therefore ap- 
pear that the notion that the results were 
caused by temporary inhibitions resulting 
from the judicial process are not con- 
sistent with the data. 

Another possible explanation of the re- 
sults is that the EA group appeared less 
aggressive because they had vented all 
their hostility during the commission of 
the offense. This would also account for 
the fact that all four groups obtained be- 
low-average scores on the Rosenzweig 
P-F Study, since the members of all four 
groups had probably released more aggres- 
sive tensions just prior to testing than had 
the normative high school sample cited by 
Deming (1960). 

There are two types of data in the study 
which would contraindicate this drive- 
reduction hypothesis. The first is the pre- 
offense behavior cited above. The second 
is the set of hypotheses in which the EA 
and MA groups were compared. While it 
might be argued plausibly that the EA 
group had undergone more drive reduction 
as a result of their offenses than other de- 
linquents in general, it would hardly ap- 
pear likely that the differences in amount 
of drive reduction between an assault 
rated as “extreme” and one rated as 
“moderate” would be sufficient to account 
for the consistently less aggressive scores 
of the EA as opposed to the MA group. 
(See Appendix B.) 

Another set of ad hoc hypotheses can be 
derived from the effects of the representa- 
tive design used in the selection of the 
assaultive sample. It would be recalled 
that all the assaultive delinquents appre- 
hended over a 10-month period were in- 
cluded in the study and that later they 
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were subdivided on the basis of the Ag- 
gression scale into the EA and MA sub- 
groups. In accord with the basic principles 
of representative design, no arbitrary re- 
strictions were placed on the subdivision of 
this population. Consequently the EA 
group had a greater percentage of white 
subjects than Group MA or Groups I and 
PO which were matched to the total assaul- 
tive population on this variable. (See Table 
1) 
It eould be argued, therefore, that the 
obtained data were the result of the differ- 
ences in racial balanee among the four 
groups. This would be particularly likely 
to influenee the ratings of detention be- 
havior if some form of stereotype, preju- 
dice, or halo effect were operating in the 
raters, although it should be pointed out 
that at least half of the raters were them- 
selves Negro. In order to evaluate this 
possibility, the detention data were recalcu- 
lated separately for whites and Negroes. 
For both white and Negro subjects, the 
EA group was found to be assessed as least 
aggressive and most controlled on the meas- 
ures of behavior while in detention. With 
the reduced Ns, the p values were of course 
mueh higher than they had been for the 
total sample. 

Another ad hoc hypothesis could focus on 
the differences in recidivism rate over the 
four groups. (It will, of course, be recalled 
that this difference was predicted in Hy- 
pothesis 1. However, it could be argued 
that the differences reported in detention 
behavior were the result of a negative halo 
effect for the recidivists on the part of the 
counselors who were, of course, aware of 
the past records. In order to evaluate this, 
the seven first offenders in the EA group 
were compared with the first offenders in 
the other three groups on the measures of 
detention behavior. Even when the study 
was limited to first offenders, the same 
directional differences were noted in the 
detention reports. Once again, the reduced 
Ns inereased the p values. 

It would therefore appear that if the 
prevalent assumption that assaultive erimi- 
nals are all undercontrolled and highly 
aggressive is to be maintained, it would be 
necessary to account for the data obtained 


by resorting to various ad hoc explana- 
tions. However, the most plausible of 
these ad hoe explanations have been ex- 
amined, and data have been adduced to 
demonstrate their inadequacy. 

There is one final interpretation of these 
data which would allow a person to pre- 
serve the simple notion that only one 
personality type is involved in assaultive 
offenses. This would be to dismiss the pres- 
ent data as simply chance phenomena which 
do not need explanation. It is, of course, 
up to the individual to decide how much 
data he will require before he will abandon 
the null hypothesis. It is also true that it 
can be hazardous to rely on directional data 
and marginal p values. Nevertheless, the 
fact that the present hypothesis predicted 
results directly opposite those implied by 
the commonly accepted alternative theory, 
that these results were quite consistent over 
a wide range of dependent variables rang- 
ing from recidivism rate before the offense 
to Movement-Color index on the HIT 
after the offense, and that the significance 
levels obtained were achieved despite the 
erudity of criminal offense as an inde- 
pendent variable, all combine to suggest 
strongly that if the null hypothesis is not 
rejected, one would be in serious danger of 
committing a Type 2 error. 

The present investigator, as might be 
anticipated, is more inclined to risk a Type 
1 error by rejecting the null hypothesis 
and accepting the notion that two types of 
people may be involved in assaultive 
offenses. If this position is adopted, one 
implieation is that prevailing conceptions 
of aggression are not always applicable to 
the dynamies of the extremely aggressive 
person. It would appear that extreme ag- 
gression is a phenomenon which should be 
studied in its own right and not through 
extrapolation from studies of milder forms 
of aggression. As is obvious from the pres- 
ent study, this type of investigation pre- 
sents many methodological difficulties. The 
fact that such investigations must take 
place in a judicial setting not only limits 
the procedures that can be used without 
upsetting institutional routines but also 
will inevitably influence the motivation 
and set of the subjects. Moreover, the ever- 
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present psychologieal problem of the ade- 
quacy of our measuring instruments is quite 
obvious when attempts are made to differ- 
entiate levels of hostility within a de- 
linquent or criminal sample (Megargee, 
1964; Megargee & Mendelsohn, 1962). 
Despite these difficulties, however, the pres- 
ent study indicates that if extreme aggres- 
sive behavior is to be understood, these 
problems will have to be coped with since 
the study of milder aggressive behavior is 
apt to be misleading. 

The study also suggests certain clinical 
problems. The first is in the area of pre- 
dieting assaultive behavior. It would ap- 
pear that there is little difficulty in diag- 
nosing or predicting the behavior of the 
Undercontrolled Aggressive type. His whole 
life style should show a pattern of re- 
curring aggression and violence, and there 
is little doubt that without some dramatic 
intervention, this style will continue. 

The Chronically Overcontrolled type 
presents a much more difficult problem, 
however. In the first place, lay personnel 
tend to overlook the potential pathology 
of the quiet, retiring person, so that they 
are much less often referred for evalua- 
tion by parents, clergy, or teachers than 
the Undercontrolled Aggressive type. 

Even if an Overcontrolled person is re- 
ferred, the clinician must somehow dis- 
criminate between the Overcontrolled pa- 
tient who is potentially assaultive and the 
one who is not dangerous. This may be 
nearly impossible for it can depend a great 
deal upon environmental events and frus- 
trations which the clinician can not antici- 
pate. 

However, when assaultive people are ex- 
amined after the offense, there are some 
indications, in retrospect, that certain cues 
may have been present which would have 
indicated potential violence. One is a pre- 
occupation with violence in fantasy. An 
11-year-old boy who fatally stabbed his 
brother was a cartoonist for his school 
paper, and after the offense people sud- 
denly recalled a cartoon in which his chief 
character was taking a fencing lesson and 
stabbed his instructor to death. The boy 
who obtained the highest rating on the Ag- 
gression scale in the present study had 


shot at his parents from ambush, killing 
his mother. (See Case No. 07009, Appendix 
B.) Several months earlier he had thought 
of writing a novel about a boy who became 
so disgusted with his parents that he just 
killed them. Despite these indications of a 
preoccupation with aggression in fantasy, 
the apperceptive measures used in the 
present study generally showed no differ- 
ence in aggressive ideation among the four 
groups. Of course these tests were ad- 
ministered in custody after the offense, 
and a different pattern might have been 
elicited at some other point in time. 

There are some indications in the pres- 
ent study, as well as from other data, 
that the distinctive pattern that distin- 
guishes the potentially assaultive Overcon- 
trolled person is outward conformity 
coupled with inner alienation (Megargee, 
1965). For instance, despite the docile, 
controlled pattern displayed by the EA 
sample in the present study, their Socializa- 
tion scores on the CPI were in the de- 
linquent range and no different from those 
of the other delinquent samples. While it 
will be recalled that the CPI data were 
biased because of a high percentage of in- 
valid profiles, nevertheless this could indi- 
cate that the Chronically Overcontrolled 
assaultive person shares the typical de- 
linquent’s feelings of futility, disgust, and 
alienation, but instead of acting out these 
feelings he customarily rigidly represses 
them. Data collected in connection with 
the development of an MMPI scale de- 
signed to detect the Overcontrolled assaul- 
tive offender also are consistent with this 
pattern. (See Footnote 5, above). 

Whether we identify the assaultive per- 
son before or after the offense, the ques- 
tion which naturally arises is how he can 
best be treated so as to make him less 
dangerous to others. Early identification 
not only has the obvious advantage of 
possibly preventing an assaultive act, but 
also allows greater freedom to choose an 
appropriate form of therapy. After an 
offense occurs, legal considerations and 
public opinion greatly restrict the range of 
possible choices. 

In the ease of the Undercontrolled Ag- 
gressive type, the basic therapeutic task is 
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to increase the inhibitions against aggres- 
sive acting out. Normally such inhibitions 
are aequired through identification with a 
well-socialized parent figure with conse- 
quent introjection of his values. However, 
in the case of the Undercontrolled Aggres- 
sive person this has not taken place. If he 
is treated early enough, it might be possi- 
ble to foster the growth of such controls by 
providing a parent substitute in the form 
of a ease worker, clergyman, “big brother,” 
or probation officer. Often, however, this 
is not feasible, so an alternative program 
must be used. This usually consists of pro- 
viding external controls with automatic re- 
wards for approved behavior and punish- 
ments for disapproved behavior. In order 
to control the schedules of reinforcement 
and protect society during the learning 
process, institutionalization is generally in- 
dicated. Such an institution may be called 
a camp, a school, a jail, or a penitentiary, 
but the basic philosophy and the basic 
program are usually the same. 

Unfortunately, such programs are less 
effective than might be desired. It is diffi- 
cult, even in an institutional setting op- 
timally to schedule rewards and punish- 
ments, with the result that most inmates 
are on a partial-reward schedule when it 
comes to the expression of aggression. In- 
stead of learning to inhibit aggression, 
they are more likely to form a discrimina- 
tion and inhibit aggression only when they 
are more likely to form a discrimination and 
inhibit aggression only when they are likely 
to be caught. Moreover, the frustrations of 
life in an institution, as well as the life of an 
ex-convict, are likely to increase the instiga- 
tion to aggression enough to offset any in- 
crease in inhibitions. 

The most appropriate treatment for the 
Chronically Overcontrolled assaultive per- 
son, on the other hand, would be some 
form of psychotherapy. The goal of such 
therapy would be to reduce excessive inhibi- 
tions so that the individual can learn to 
acknowledge and accept his feelings of 
hostility and learn ways of expressing them 
which would allow some measure of need 
satisfaction while still not posing too great 
a threat to society. Tr 

If the potentially ‘Beta ae ,overcon- 


trolled person is detected prior to an ag- 
gressive outburst, such a treatment pro- 
gram can be instituted fairly easily. How- 
ever, it can be a delicate therapeutic task 
to remove such inhibitions in a person with 
a great deal of repressed hostility without 
precipitating either a psychotic break or ex- 
cessive acting out. 

Postoffense treatment on the other hand 
must cope not only with the problem of 
guilt, but also with limitations imposed by 
judicial procedures. If an extremely assaul- 
tive offense has been committed, it is likely 
that the patient will have to be treated in 
some form of penal institution. As noted 
above, the program of such institutions is 
to reward control and conformity and to 
punish assertiveness or aggression, This 
means that the goals of the institutional 
program and the therapeutic program will 
be at complete odds with each other, The 
patient will have few chances to practice 
assertive and mildly aggressive responses 
in a setting in which they are apt to be 
rewarded. 

If an attempt were made to match the 
treatment program to the needs of the dif- 
ferent types of inmates within a given 
institution, chaos would result. Undercon- 
trolled Aggressive people would be pun- 
ished for doing the same sorts of things 
that Chronically Overcontrolled people 
were being encouraged to do. This would 
naturally be interpreted as injustice and 
favoritism. It would, therefore, be neces- 
sary to treat the two types of offenders 
separately, either at different institutions 
or by inearcerating the Undercontrolled 
offender while placing the Chronically 
Overcontrolled person on probation with 
outpatient therapy. However, since the 
Chronically Overcontrolled assaultive per- 
son is likely to have committed the more 
severe offense, it would be very difficult to 
obtain support either from the public or 
from legislative bodies for such a program. 

The proposed typology has implications 
for psychological theory as well as for clini- 
cal practice. The model of aggressive dy- 
namics which has been used throughout 
this investigation is the frustration-aggres- 
sion del-eziginally | proposed by Dollard 
et à MN ån which Y wp to ag- 
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gression was viewed as the result of frus- 
iration. Whether or not this instigation 
resulted in overt aggressive behavior de- 
pended upon the relative balance of insti- 
gation and inhibition. If the inhibitory 
forces exceeded the instigation to aggres- 
sion, no aggressive response should result; 
on the other hand, if instigation exceeded 
inhibition, in the presence of sufficient cues 
to aggression, then overt aggressive behavi- 
ior was likely. 

Attempts have been made to apply this 
schema to the prediction of the relative 
intensity of the overt aggressive response. 
'The most frequent hypothesis is that the 
intensity of the aggressive response is a 
function of the net strength of instigation 
minus inhibition. (If inhibition exceeds 
instigation, the result is less than zero and 
no response occurs.) Bandura and Walters 
(1959) write for instance: 


By subtracting the height of the curve represent- 
ing the strength of the inhibitory response from 
the height of the curve representing the strength 
of the inhibited response, it is possible to represent. 
the strength of the overt response that may be ex- 
pected at any point on the dissimilarity con- 
tinuum [p. 133]. 


This formulation, however, does not 
seem applieable to extremely assaultive, 
Chronically Overcontrolled offenders for 
whom the violence of the overt act is out 
of all proportion to the immediate stimulus. 
In one case for instance, a mild-mannered 
inoffensive bachelor was abused and in- 
sulted for weeks by an Undercontrolled 
Aggressive neighbor who had a long record 
of assaultive behavior. One night, after 
enduring 45 minutes of abuse, he found one 
final insult too much to bear and pro- 
ceeded to shoot his tormentor four times 
at close range. Immediately prior to that 
final insult, his instigation to aggression 
was apparently less than his inhibitions, 
for he made no aggressive response. The 
amount of instigation added by the next 
taunt was sufficient to inerease the level of 
instigation to the point where it finally 
exceeded the inhibitions and the violent 
aggressive response was elicited. 

In this case, then, the net strength of 
instigation minus inhibition was un- 
doubtedly quite low and the assault which 


took place was out of all proportion to it. 
On the other hand, the absolute level of 
instigation to aggression, which had been 
slowly building up over the months, was 
probably quite high. In this case, then, it 
would appear that the strength of the 
overt aggressive act was a function of the 
total amount of instigation and not the 
net strength of instigation minus inhibition. 
In fact, if the intensity of the aggressive 
aet were a funetion of the net strength, 
then one would predict that no person with 
excessive inhibitions could ever commit 
more than the mildest of aggressive acts, 
except under conditions of extreme provo- 
cation. However, the data in the present 
study indicate that the opposite is true; 
when excessively controlled people do 
aggress it is more likely to be in an ex- 
treme fashion with inadequate provoca- 
tion. (See Appendix B.) 

This notion that the violence of the overt 
aggressive act is a function of the total 
instigation to aggression, rather than the 
net strength, has, of course, been implicit 
during our whole diseussion of the Chroni- 
cally Overcontrolled offender, for without 
it his overwhelming violence with minimal 
external provocation is incomprehensible. 
However, while this formulation neatly ac- 
counts for the paradoxieally more extreme 
aggression of the Chronically Overcon- 
trolled type as opposed to the Undercon- 
trolled Aggressive type, it is oversimplified. 
In essence it reduces to: The greater the 
instigation to aggression toward a target, 
the greater the degree of violence of the 
aggressive response to the target, if an 
aggressive response is allowed to occur. The 
difficulty with this formulation is that it 
overlooks the phenomenon of response 
generalization. A person who is strongly 
angered by his wife may be motivated to 
make a highly aggressive response, but, 
because of excessive inhibitions, suppress 
the response. One way he could deal with 
the situation would be to displace his ag- 
gression to another target (Miller, 1948). 
Another way would be to make a lesser 
aggressive response to the original target. 
Thus while his original inclination may 
have been to hit his wife, he may suppress 
this response and instead make a sarcastic 
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remark, slam the door, or behave with ex- 
cessive politeness in a passive-aggressive 
manner, Thus, for any given target there 
is a constant level of instigation to aggres- 
sion but any number of possible aggressive 
responses. Obviously, then, there is no simple 
direct relationship between instigation, in- 
hibition, and overt aggressiveness. 

This study has therefore highlighted two 
areas of oversimplification. The first was 
the oversimplified notion that aggressive 
behavior is associated only with deficient 
controls. The second was that relative 
balanee of inhibition and aggression was 
sufficient to account for the strength of 
the aggressive response in all situations. 
There is also a third area of thought which 
seems oversimplified in the light of the 
present investigation. This is the area of 
theories about the etiology of delinquent 
behavior. The publie typically asks psy- 
chologists, sociologists, and educators, 
“What causes delinquency?” and, all too 
often, they have been willing to respond 
with some single explanation. Sometimes 
this explanation is complex such as “ano- 
mie” (Merton, 1957) or “superego lacunae” 
(Johnson, 1949). At other times it may be 


quite simple such as “too much violence 
on television.” 

If nothing else, this study should dem- 
onstrate that any attempt to establish a 
single, simple cause for crime or delin- 
quency is certain to fail. It is apparent 
that even within the relatively simple cate- 
gory of aggressive behavior there are vast 
differences in personality patterns among 
the people who engage in such behavior. If 
we expand the horizon to include the 
whole panorama of illegal behavior sub- 
sumed under the headings of, “crime” or 
“delinquency,” ranging from dope peddling 
to income tax evasion, from safecracking 
to homosexuality, from traffic violations to 
murders, the futility of finding a single 
cause or a single cure can be seen. The 
first step needed is adequate classification 
based on empirical research; the next is 
study of the dynamics of each type or 
class to determine the appropriate treat- 
ment; the final step is applying the research 
so that instead of making the punishment 
fit the crime we can instead make the treat- 
ment fit the criminal. The present study 
represents one beginning to this first task 
of empirical classification. 


APPENDIX A 


TEN-POINT SCALE OF AGGRESSIVENESS ON WHICH ASSAULTIVE OFFENDERS WERE RATED 


Scare VALUE 


BEHAVIOR 


Subject showed good restraint. Resorted to aggression only when it was clearly 
dictated by circumstances, that is, hit back with equal or less force; self defense. 
Less restraint shown but degree of aggression still quite appropriate; or instru- 
mental aggression (i.e., aggression whose primary motive is something other than 
inflicting pain—strong-arm robbery), with enough violence to accomplish the end 


Aggression exceeds provocation, but not inappropriate in subculture; or instru- 
mental aggressive acts where degree of violence begins to indicate that desire to 


Aggression exceeds provocation even more but would not be viewed as a particu- 
larly extraordinary response by members of subeulture—hitting person who calls 
defendant a name or ganging up on vietim; or instrumental aggression which 


Acts of aggression clearly motivated by desire to inflict pain or injury. Culture and 
situation less supportive of degree of violence used. Would probably be rejected 
by adult members of subculture but not necessarily by peer group, for example, 
hitting when down. Violence at this point still not likely to seriously or perma- 


Even less justification than (5)—victim weaker or frailer. More apt to do serious 


i 
2 
goal, but no more. 
3 
inflict pain is also a motive. 
4 
clearly exceeds amount needed to accomplish act. 
5 
nently injure victim, although severe injuries might occur accidentally. 
6 
harm (stomping), or use of weapon versus superior, unarmed antagonist. 
7f 


Serious aggression with inadequate provocation. Apt to result in serious injury to 
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S 07009 


S 02391 


S 04116 


S 49797 


S 98083 


S 72044 


S 92552 


S 45091 


S 97653 
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victim. Most members of subculture would feel use of this much violence in this 
situation unjustified, although it might still be sufficiently provocative to call for 
a lesser physical response such as use of weapon when called name or in gang fight 
versus unarmed opponents of equal or less size. 


8 More serious aggression. Death, or permanent disability quite likely. There may 


be some external motivation apparent for act, but it clearly does not justify this 
degree of response. 


9 Extremely severe aggression with serious consequence probable. Would be rejected 


by all in subeulture as unjustified. Some glimmer of external motivation still 
apparent, for example, a murder or assault with a deadly weapon with little 
motivation, but in heat of anger. 

10 Completely externally unprovoked, extremely serious aggression with extreme 
physieal harm probable. No external motivation, for example, a “senseless” 
murder or assault with a deadly weapon, not even done in the heat of anger. 


APPENDIX B 
SUMMARY or OFFENSES COMMITTED BY THE ASSAULTIVE SUBJECTS 
ExTREMELY AssAULTIVE OFFENSES 


Rare: 9.0-10.0 


Shot and killed mother with rifle from ambush. Also fired at father but missed. No known 
external provocation. 


Shot a strange woman with a rifle while cruising in a car with two friends looking for youths 
who had allegedly beaten him up. 


Fired at but missed an adult who had threatened to slap his face the day before. Later 
stated that he intended to kill rather than frighten the victim. 


Ratine: 8.0-8.9 


Tried to talk 19-year old housewife into letting him enter her home. When told to leave he 
grabbed her arm, pulled her through the chain-locked door, and hit her on the head witha 
gun. Upon gaining entrance he fought with her and then ran away. 


Rare: 7.0-7.9 


Defendant hit the victim in the eye twice, knocked him down and continued to hit the 
victim with both fists until blood stained his pants. Victim maintained there was no provoca- 
tion; the defendant said the victim had called his mother names and had dared him to hit 
him. 


Defendant harassed the victim verbally. The victim hit the defendant whereupon the 
defendant went home, secured a knife, returned and slashed the victim. 


Defendant felt (with some possible justification) that his teacher was persecuting him. He 
responded with passive aggression, but when he discovered he had been suspended, he 
returned and clubbed the teacher on the head with a mummified deer hoof. 


Rare: 6.0-6.9 


In the course of a violent family fight in which the defendant's brutal and intoxicated father 
was beating his mother, the defendant secured the father's pistol. When the father turned 
on him, his sister shouted at him to fire. He did so, killing the father. 


(See also 94887) Three drunken white boys had harassed a group of small Negro children. 
The defendant was one of a group of large Negro boys who then started a fight with the 
white boys. When one of the white boys threatened the defendant with a tennis racket, he 
pulled a zip gun and fired it. No one was hit. 


S 44125 


S 07232 


S 88663 


S 03805 


S 92441 


S 45986 
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MODERATELY ASSAULTIVE OFFENSES 


Rame: 5.0-5.9 
The defendant was one of three Negro boys who tried to taunt three white boys into a 


fight. When the white boys did nothing, the defendant hit one of them in the jaw with 
brass knuckles, chased and kicked him. 


The victim, during a school class, told the defendant he was an “ass.” Outside the classroom 
the defendant grabbed the victim and hit him in the mouth with brass knuckles. 


Two counts of battery. No. 1: When his brother was in a fight the defendant assisted by 
attacking the opponent with a wrench. No. 2: Was the principal aggressor when he and his 
friends attacked some other young boys who were smaller than they were. 


Argued with large football player who offered to fight him. Armed self with a broken bottle 
and waited for the football player. The victim got a long stick, and they attacked each 
other. 


After the victim and defendant had boxed in school, the rumor spread that the victim had 
challenged the defendant to an after-school fight. Later in a group, the victim denied this 
challenge. The defendant suddenly hit the victim and knocked him down. The defendant’ 3 
friends joined in and someone kneed and kicked the victim in the face. 


RATING: 4.0-4.9 


Fighting broke out among a large group of teenagers. The victim tried to leave the group, 
and the defendant chased him and hit him several times as he tried to get away. 


Ss 18067 and 82579 A 51-year old man observed a group of juveniles on his fence. He told the group 


S 67614 


S 42363 


S 32410 


S 06750 


S 53254 


S 53279 


S 96458 


S 35148 


to disperse, turned his back and the Ss attacked him knocking him to the ground causing 
15 stitches to be required. Both these defendants were involved. 


The defendant walked up to victim, called him a “fink,” and knocked him out, with his 
fist. He possibly also kicked the victim when down. His excuse was that the victim had 
once reported the defendant’s brother for some offense. 


The defendant and friends demanded a sailor give them a dime. The sailor ignored them, 
so the defendant pushed the sailor, swung at him, and started a fight. 


In the course of an attempt to snatch a woman’s purse, the defendant struck the woman 
on the head with a blunt object without provocation. 


The defendant was one of a group of boys aged 8 to 11 who stole the wallet from an 82- 
year-old cripple and then taunted and jostled him and also pushed an adult female who 
attempted to assist. 


After the victim’s older brother had beaten up a friend of the defendant, the defendant and 
his friends harassed the victim’s entire family. In this instance, he punched the victim with- 
out provocation. The defendant maintained the victim had called him a “black nigger.” 


After crashing a party, the defendant got into an argument with a larger boy. He pulled 
out a can-opener, planning to threaten him, but in the ensuing scuffle, the victim was 
scratched. 


The defendant was one of 70 teenagers who milled around and blocked traffic. The defend- 
ant hit a 52-year-old man in one of the cars in the face. When a peacemaker tried to inter- 
fere, he and his friends chased him home, tore the clothes off him and wrecked his house. 


Ratine: 3.0-3.9 


The defendant struck the victim who was walking along the street. He claimed that the 
victim had bumped into him, but the victim denied this. 
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S 60952 The defendant asked the victim for a nickle. The victim refused so the defendant shoved 
him against some lockers. Three stitches had to be taken in the victim. 


S 06347 In the course of brandishing a small pocket knife, the defendant inflicted a small stab wound 
in a boy's back under possibly accidental cireumstances. 


Ratine: 2.0-2.9 


S 22841 In the course of a purse snatching in which the vietim was knocked down, this defendant 
kicked the victim in the face but inflicted no damage. He was wearing sneakers and said 
that his foot slipped. 


S 94887 (See also 97653) This defendant was another member of the group of Negroes who were 
in a fight with white boys who had been harassing some smaller children. This particular 
defendant was only minimally involved in the altercation. 


S 69900 The defendant and a friend accosted the victim in the schoolyard. The defendant and the 
victim had been feuding for some time. When the victim made a gesture as if he was going 
to hit the defendant, the defendant struck first, hard, causing a concussion. 


APPENDIX C 


BEHAVIOR Cueck List on Wuich COUNSELORS REPORTED Instances OF AGGRESSIVE 
BEHAVIOR ON THE THIRD AND TENTH Days or DETENTION 


Instructions: In making these reports check those types of aggressive behavior you have noticed 
during that particular period. Be sure to look for the less obvious and more secretive or subtle types of 
aggressive behavior, as well as the more obvious kinds. It is important that you turn in these sheets 
at the end of the third and tenth day as they are completed. 

Check List: 

1. Physical Attack. Starting fights, hitting, pushing—unprovoked by verbal and physical attack 
of other children. 

2. Bragging. Assertively, with show of bravado—"TI can do this better than you” sort of thing. 

3. Threatening. Specific, hostile verbal or physical threat, or threatening act. 

4. Teasing. Including specific acts which appear designed to annoy or irritate, hurt, or humiliate. 

5. Saucy, Impertinent. “Smart-alecky.” 

6. Insulting, Name Calling. Direct face-to-face with object of hostility. 

7. Ridiculing, Mocking, Making-Fun-Of. 

8. ui. Another who is smaller or weaker, or who for some reason can't defend himself ef- 
eetively. 

9. Verbal Castigation. Cursing, upbraiding, blaming, “giving somebody hell” verbally. 

10. Malicious Gossip, Depreciating, Defaming, or Tale Carrying (Tattle-Taling). 

11. Destructive. Breaks things, defaces walls, tears or dirties clothing or bedding, ete. 

12. Temper Tantrums. Fits of rage, screams, kicks, scratches, etc. 

13. Running Away. 

14. I have not observed this boy at all due to my schedule or due to the fact he has been in isola- 

tion the whole time. 


€! Check list originated by Naylor (1952) and reproduced with his permission. 


t i This item was added by the present investigator and did not appear on the original Naylor check 
ist. 


APPENDIX D 


Ratine ScALEP! FILLED Our BY COUNSELORS ON THIRD AND TENTH DAYS or DETENTION 
Instructions: Check the point on each scale which in your opinion best describes the behavior of 
this child during the past week. 


In making these ratings, try to compare him with all the other children you have known. Judge him 


with respect to each quality independently; that is, judge objectively and try not to be influenced by 
your general impression of him. 


P! Rating scale originated by Naylor (1952) and reproduced with his permission. 
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You may check any place on a scale. 
Be sure to rate him on all five scales. 
Each child is to be given a rating on the third and tenth day of custody. 


ih 


Extremely uncoop- 
erative; refuses to 
follow any sugges- 
tions; unwilling, 
antagonistic. 


1 


Actively dislikes 
quarrels. Acts as 
peacemaker. Good 
humored. 


1 


Threatens others; 
dominant; reacts 
io reproof vio- 
lently; overtly ag- 
gressive; starts 
trouble, 


g. 
Passively agrees to 
everything; no 
sign of resistance 
or unwillingness. 


1. 


Marked hostility, 
suspiciousness, or 
unfriendliness. 


Scale 1. Uncooperative-Cooperative 


2. 


Uncooperative: re- 
plies perfunctorily 
to questions; in- 
different. 


Scale 
2. 
Has sunny disposi- 


tion. Quarrels less 
than average. 


Scale 
2. 


Seldom or reluc- 
tantly gives in; re- 
acts to violence 
with violence. 

Threatens others. 


Scale 4. Docile-Rebellious 


2. 


Tends to accept 
suggestions and do 
what he is told 
without resistance. 


Scale 
2. 
Not as marked as 
1, but less friendly 


than the average 
child. 


3. 


"Takes situations 
for granted; re- 
sponds  willingly 
but volunteers 
little. 


2. Amiable-Quarrels 
3. 


Quarrels under 
real provocation; 
occasionally starts 
quarrel. Generally 
amiable. 


4. 


Likes being asked 
to do things; vol- 
unteers occasion- 
ally. 


ome 
4. 


Quarrels more 
than average child. 


8. Aggressive-Submissive 


3. 


Complies with nor- 
mal authority; re- 
acts with violence 
only when pro- 
voked. 


3. 


Conforms normally, 
to all reasonable 
requests and ac- 
cepts authority as 
necessary. 


5. Antagonistic-Frie! 
3. 


About like the aver- 
age. Has both likes 
and dislikes. 


4. 


Gives in readily; 
objects to violence 
with “Stop!” but 
not with blows. 


4, 


Tends to resist au- 
thority but will 
conform if 

enough pressure is 


put on him. 
ndly 
4. 
More friendly and 


outgoing than the 
average child, but 
not as marked as 
5. 
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5. 


Very cooperative. 
Volunteers help 
readily; anxious 
to do anything 
asked. 


5. 


Pronounced tend- 
ency to be quar- 


relsome; has a 
“chip on the 
shoulder.” 

5. 


Complies with all 
requests; submits 
to violence with- 
out doing any- 
thing about it. 


5. 


Hostilely defiant 
rejects all sug- 
gestions and re- 
sists any  re- 
straint. 


5. 


Exceptionally 
outgoing and 
friendly. Likes 
practically every- 
one and wants 
them to like him. 
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APPENDIX E 


TABLE El 


Lists or *"OvERCONTROLLED' AND 'ÜNDERCONTROLLED' ADJECTIVES FROM THE 
Goueu Apsective Cueck List 


Overcontrolled Undercontrolled 
Adjective No. Adjective Adjective No. Adjective 
28 Cautious 7 Aggressive 
43 Conscientious 14 Argumentative 
45 Considerate 17 Assertive 
49 Cooperative 23 Boastful 
85 Fearful 24 Bossy 
100 Gentle 52 Cruel 
111 Helpful 59 Demanding 
129 Inhibited 70 Dominant 
146 Mannerly 114 Hostile 
149 Meek 121 Impulsive 
158 Nervous 138 Irritable 
171 Peaceable 144 Loud 
191 Quiet 152 Mischievous 
207 Retiring 168 Outspoken 
214 Self-controlled 188 Quarrelsome 
230 Shy 197 Rebellious 
253 Submissive 210 Rude 
268 Timid 211 Sarcastic 
297 Withdrawn 228 Show-off 
299 Worrying 271 Tough 
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GENERALITY OF WORD-ASSOCIATION RESPONSE SETS 
LOUIS J. MORAN 


University of Texas 


The tendency of some Ss to give predominately 1 category of associate in 
the word-association experiment, regardless of word list, was examined in a 
series of studies. Earlier findings on such tendencies were confirmed with 
several samples of college students and with Spanish-speaking Ss. A 
4th such tendency, or idiodynamic set, perceptual-referent (Jung’s “predica- 
: tion type"), was added to the 3 sets previously described. Evidence was 
presented for the sets as more generally representing bases for matching 
word pairs, forming a hierarchy of increasing linguistic sophistication in the 
order: perceptual referent, object referent, concept referent, and dimension 
referent. Word-association commonality was discussed as an arbitrary aver- 


age across several stable subhierarchies. 


N the “free” word-association experi- 
ment, where the subject is instructed to 
give the first word that occurs to him when 
he hears the stimulus word, some individuals 
tend to give predominately one “category” 
of associate, Because these individuals are 
highly consistent in giving one characteristic 
category of associate on different occasions 
and to different word lists, they are said to 
manifest enduring “idiodynamic” associa- 
tive sets. Three such sets recently have been 
described (Moran, Mefferd, & Kimble, 
1964). Individuals with an object-referent 
set tend to give associates categorized as 
“functional,” usually naming another object 
that is associated in everyday experience 
with the object denoted by the stimulus 
word, for example, roor, shoe; Boat, dock. 
Other individuals evidence a concept-refer- 


| ent set, characterized by a preponderence of 


synonym (e.g., SMALL, little) and superordi- 
nate (e.g., CABBAGE, vegetable) associates. A 


| third group of individuals tends to give very 
| fast contrast (e.g., BLACK, white) and logical 


coordinate (e.g., APPLE, orange) associates 
and are said to have a set for speed in re- 
sponding. 

Dependent variables in the free word- 
association experiment are significantly 
influenced by an interaction between the 

* This investigation was supported by Research 


Grant MH-08778 from the National Institutes of 
Health, United States Public Health Service. 


subject’s idiodynamic set and the set com- 
patibility of the stimulus words, Response 
faults (e.g., delayed reaction time, blank, 
multiword, ete.) occur less frequently to 
stimulus words that are most compatible 
with the subject’s set. Commonality score 
(degree to which a subject’s associates cor- 
respond to those of a normative group) is a 
partial function of the number of stimulus 
words in the list that happen to be compati- 
ble with the subject’s set. Grammatical form 
of associates is influenced by set: object- 
referent, concept-referent, and speed sets 
tend to produce noun, verb, and adjective 
associates, respectively (Moran et al, 
1964). 

The studies which follow were designed 
to determine the generality and reliability 
of specific idiodynamic associative sets and 
to seek a unifying rationale for these indi- 
vidual differences in linguistic habits. 


SEr-CowPATIBLE Worp Lists 


The pervasive influence of associates sets 
needs to be taken into account in the con- 
struction of word lists. To illustrate, of the 
100 primary (most popular) response words 
in the Kent-Rosanoff list, 38 are contrast or 
coordinate associates, yielding a sum com- 
monality score of 1,746. Only 10 primary 
response words in the list are synonym or 
superordinate associates, yielding a sum 
commonality score of 351 (Russell & Jen- 
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kins, 1954). Because of this inequality, it 
may be predicted that subjects with a con- 
cept-referent set will commit more faults 
and achieve a lower commonality score on 
the Kent-Rosanoff list than will subjects 
with a so-called “speed” set. Indeed, even 
the seemingly well-established speed set of 
subjects who give many contrast-coordinate 
associates may be an artifact of biased word 
lists. Studies that have reported a very 
marked increase in contrast associates in- 
duced by time pressure (e.g., Siipola, 
Walker, & Kolb, 1955) have used word lists 
that were “loaded” with popular contrast- 
evoking stimulus words. Time pressure may 
well increase the frequency of popular asso- 
ciates (Horton, Marlowe, & Crowne, 1963) ; 
but whether these popular associates turn 
out to be predominately contrasts, or syno- 
nyms, or some other category is a function 
of the word list selected by the experimenter. 
The same ambiguity attends some interpre- 
tations of alleged changes in category of 
associations in developmental studies. As 
children approach adulthood they will, by 
definition, tend to give more adultlike con- 
trast associates to the Kent-Rosanoff list. It 
should be noted, however, that to a different 
word list, one that evoked predominately 
synonym associates from adults, the same 
maturing children would tend, by definition, 
to give more adultlike synonym (rather 
than contrast) associates. 

By a careful selection of stimulus words, 
lists may be constructed that are equally 
compatible with two or more associative 
sets. For example, words like BLossow, which 
evoke predominately synonyms (flower, 
672) may be counterbalanced in the same 
list with words like cru, which evoke con- 
trast associates with nearly the same fre- 
quency (boy, 670) (Russell & Jenkins, 
1954). Such a list should be reasonably 
compatible with either a set to give syno- 
nyms or a set to give contrasts. The fre- 
quency tables for 400 stimulus words, based 
upon the associations of 196 men (Moran 
et al, 1964), were used in this manner to 
construct two word lists, equally compatible 
with the three sets described above. 

It was possible to locate among the 400 
words, 20 words which had elicited asso- 


MORAN 


ciates with near equal frequency in all three 
categories: for example, HEAT elicited fire, 
warm, and cold with nearly the same fre- 
quency. These words were assigned 10 to 
List A and 10 to List B, so that overall set 
compatibility was maintained within each 
list and between lists. 

An additional 24 stimulus words were 
found which had elicited, with nearly equal 
frequency, responses compatible with two 
of the sets, but not the third; for example, 
BEAUTIFUL elicited pretty and ugly, but no 
associates categorized as object referent, 
These words were assigned 12 to List A and 
12 to List B, so that all three sets were 
equally represented within and between 
lists. 

Of the remaining 356 stimulus words in 
the pool, 36 had elicited predominately a 
response word compatible with one set only: 
for example, park elicited mainly light. Le 
A and B each received 18 of these words, 
matched overall for compatibility with the 
three sets. The last 18 words in each list 
were ordered systematically, with respect t 
the predominate category evoked, that B 
contrast-coordinate, ^ synonym-superordi- 
nate, functional, contrast-coordinate, ete. 

The two 40-word lists, then, each b | 
tained Words 1-10 equally compatible wi 
all three sets, Words 11-22 each compatible 
with two sets but not the third, and Words 
23-40 each compatible with only one of the 
three sets. The two lists were carefull 
matched for overall compatibility with 
three sets. The complete word list is pt 
vided in the Appendix. 


Subjects 


Two large classes in freshman psychology, 
at the University of Texas, totaling 49^ 
served as subjects. 


Administration of Word Lists 


Subjects were instructed to write the fis! 
word that came to mind when they hea! 
the word read by the examiner over 
auditorium speaker system. They were tol 
that the words would be read at 5-seco 
intervals and to leave a blank space if no 
sponse word came to mind. Also, they We 
asked not to change a response or to ret! 
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to fill in a blank space later. Words were 
read in the order: List A, Words 1-10; B, 
1-10; A, 11-22; B, 11-22; A, 23-40; B, 23- 
40. 'The same examiner tested all subjects. 


Variables 


"The following variables were scored. Most 
of the scoring was clerical, using a manual 
consisting of the prescored responses of 196 
men (Moran et al., 1964). 

1. Functional. Stimulus word and re- 
sponse word each separately denote entities 
or processes between which there is an ex- 
plicit functional relationship, for example, 
roor, shoe. 

2. Synonym or superordinate. Synonym 
—response word has exactly the same mean- 
ing as the stimulus word in one or more 
ordinary and appropriate contexts, for ex- 
ample, BLOssoM, flower. Superordinate— 
stimulus word denotes an immediate mem- 
ber of the class or category denoted by 
response word, for example, CABBAGE, vege- 
table. 

3. Contrast or logical coordinate. Con- 
trast—response word negates or contrasts 
with the meaning of stimulus word in one or 
more ordinary and appropriate contexts: 
for example, park, light. Logical coordinate 
—stimulus word and response word sepa- 
rately denote immediate members (of equal 
logical order) of the same class or category, 
for example, BLUE, yellow. 

4. Total faults. Sum of blanks and multi- 
words. 

5. Commonality. Sum frequency of sub- 
ject's associates also given by the 196 men: 
that is, each response word received a 
“score” corresponding to the number in the 
normative group that gave the same re- 
sponse word. 


Statistical Analyses? 


All correlations in this report are Pearson 
product-moment correlation coefficients. All 
factor analyses are for Principal Compo- 


? Computations were carried out at the Compu- 
tation Center of the University of Texas, with 
programs compiled by Don Veldman, of the De- 
partment of Educational Psychology, to whom 
appreciation is expressed for his very generous and 
helpful consultations on statistical problems. 


nents, rotated by the normalized varimax 
method, with unities placed in the diagonal. 
Factor extraction in all analyses was 
stopped when eigenvalues dropped below 
unity. 


Equated Word Tits 


Means, standard deviations, and inter- 
correlations for Lists A and B, based upon 
the 482 college students, are given in Table 
1. Here it may be seen that the mean fre- 
quency of functional and of contrast-co- 
ordinate associates was nearly equal in both 
lists, with synonym-superordinate associates 
somewhat less frequent on both lists. In- 
ternal consistencies of the variables ranged 
from .56 to .84, after correction for an 80- 
word list. List A was less “difficult” (higher 
commonality and fewer faults) than List B. 

The five variables for List A and for List 
B were factor analyzed jointly, as shown in 
Table 2. Factor I represents the contrast- 
coordinate associates in both lists, loading 
.75 in List A and .74 in List B. Functional 
associates are represented by Factor Il, 
loading .76 in List A and .83 in List B. Fac- 
tor III represents the synonym-superordi- 
nate associates, which loaded .84 and .86 in 
Lists A and B, respectively. Thus, the same 
three factors found in earlier samples 
(Moran et al., 1964) appeared also in the 
present college sample. 

The consistent appearance of these three 
factors in samples of subjects tested on 4 
successive days, with different word lists 
each day, led to the initial postulation of the 
three idiodynamie sets described above. In 
a table much like Table 3, below, it was 
demonstrated that the subjects who had 
evidenced a definite set had achieved a 
higher commonality score and had com- 
mitted the fewest faults on the stimulus 
words most compatible with their set 
(Moran et al., 1964). For the present sam- 
ple of 482 students, predictions were made 
from associates on one group of stimulus 
words to an independent group of criterion 
stimulus words. Subjects were selected to 
represent a set if they had a standard score 
greater than 1.00 on one of the three set 
variables, and less than .00 on the other 
two set variables, based upon their first 44 
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TABLE 1 
MEANS, STANDARD DEVIATIONS, AND INTERCORRELATIONS OF 10 VARIABLES FOR 482 FRESHMEN 
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Intercorrelations 
Variable Mean SD 
1 2 3 4 5 6 1 8 
List A 
1. Functional 9.8 3.0 
2. Synonym-super- 
ordinate 7.4 2.8 | —.08) 
3. Contrast-coordi- 
nate 10.0 4.2 | —.33| —.07) 
4. Total faults 3.0 2.5 | —.17| —.28| —.33) 
5. Commonality 627.2 | 151.5 j| .34| .55| —.44 
List B 
6. Functional 9.0 3.3 .39| —.06| —.24| —.13| —.02 
7. Synonym-super- 
cordate 7.8 3.1 .02} .53| —.08| —.10| .23| —.24 
8. Contrast-coordi- 
nate 9.3 4.5 | —.22, —.02|  .73| —.21 +49) —.84| —.08 
9. Total faults 3.5 2.8 | —.05| —.16| —.24| .53| —.30| —.10| —.24| —.29 
10. Commonality 585.2 | 151.1 .07| .15| .460 | —.30| .57| «17 .25 57 


associates only (Words 1-22 in Lists A 
and B). It will be recalled that the last 18 
words in each list consisted of three groups 
of 6 words each which had elicited (from 
196 men) predominately associates com- 
patible with only one set. The last 18 words 
in both lists were combined so that each 
type of stimulus word was represented by 
12 set-specifie stimulus words. It was pre- 
dicted that subjects who evidenced a set 
on the first 44 words would achieve their 
highest commonality score on the group of 
criterion stimulus words most compatible 
with their set. Faults were too infrequent 


(mean below one) on these high-commonals 
ity words to permit analysis. 

Note first in Table 3 that average com 
monality values obtained on the thre 
groups of stimulus words remained equated 
for the present sample, being 26.0, 250 5, and 
25.9. As predicted, however, subjecti 
achieved their highest commonality scores 
on stimulus words most compatible with 
their own set. The commonality score of al 
individual thus is a partial function of the 
interaction between his idiodynamic ass 
ciative set and the set compatibility of the 
stimulus words. To illustrate, the mean com 


TABLE 2 
NORMALIZED Varmax ROTATED Factors ror 482 FRESHMEN 
Watt Factor 
I i pud k 
List A 
1. Functional .05 76 .01 .58 
2. Synonym-superordinate .15 —.02 84 73 
3. Contrast-coordinate 75 —.47 —.93 83 
4, Total faults —.64 — 129 = 122 5 
. Commonality .78 -= 3 j 
nic .01 .24 .66 
6. Functional .07 83 —.19 72 
7. Synonym-superordinate .10 —.09 .86 126 
8. Contrast-coordinate 74 =48 m .83 
9. Total faults = iez — 123 =a 48 
10. Commonality «79 “06 “09 63 
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TABLE 3 
EFFECT or SET UPON COMMONALITY SCORES 


Type of stimulus word 


N Type of subject Functional erk m COUPE. Mean 
(Average commonality [75] of responses) 

41 (F) Object-referent 28.1 21.5 23.5 24.4 

25 (SS) Concept-referent 25.4 28.6 20.6 24.9 

39 (CC) Speed set 24.4 26.5 33.7 28.2 
Mean 26.0 25.5 25.9 


Note.— Type of subject represented by subjects with z score greater than 1.00 on variables represent- 
ing one set and less than .00 on variables representing the other two sets, based upon first 44 associates. 
‘Type of stimulus word each represented by 12 words that had elicited predominately one type of re- 
sponse from 196 men; these were the last 36 stimulus words in the list. 


monality score (ie., averaged across the 
three types of stimulus words) of object- 
referent, and concept-referent set subjects 
was almost the same, 24.4 and 24.9, respec- 
tively. But if all stimulus words had been 
of the “functional” type, the object-referent 
set subjects would have achieved a 13% 
higher commonality score; if all had been of 
the synonym-superordinate type, a 12% 
lower score. Thus, the type of stimulus 
words in a list could make a 25% difference 
in the commonality score achieved by these 
subjects. 

The present word lists proved to be rea- 
sonably well equated for set compatibility, 
though split-half reliability and fault-evok- 
ing potential was fairly low. Since faults 
(as well as reaction time, i.e., Marbe’s Law) 
are highly correlated with commonality 
(Laffal, 1955; Moran et al., 1964), some in- 
formation on these variables may be gleaned 
indirectly from observed variability in the 
commonality measure in the present study 
and in those to follow. 

Prior to an account of the application of 
these lists in a series of studies, attention 
should be called to the relevance of idiody- 
namic associative sets to traditional com- 
monality norm tables. Results on the pres- 
ent 482 students provide some information 
on the composition of word-association 
commonality norms in general. 


Commonaurry Norms AND INDIVIDUAL 
Ser HIERARCHIES 


In the early Wiirzberg studies, the com- 
monality value of stimulus response word 


pairs (“individual strength of the reproduc- 
tion"), interacting with momentary “deter- 
mining tendency" ("operating task,” or set) 
provided an account of “thinking.” In these 
controlled association studies the subject 
usually was provided a series of different 
associative tasks (sets) by the experimenter. 
Watt’s (1964) observations on the inter- 
action between “strength of the reproduc- 
tion” and the operation of sets provide an 
interesting and dynamic account of the re- 
sults reported in Table 3, above. 


-..the influences which determine every event in 
our mental experience fall into two large groups, 
the operating task and the individual strength of 
the reproductions which come thereby into ques- 
tion. On the one hand, the task may find no re- 
productions, in which case no reaction can occur; 
and, on the other hand, the strength of the tend- 
ency to reproduction may be too great for the 
task to operate, in which case it forces its way out 
in spite of the task, or before any reproduction 
which the task favors has had time to become 
actual: in other words, a wrong reaction takes 
place. Otherwise more or less suitable reactions 
occur, This is thought to be valid for the whole of 
our mental experience . . . [p. 194]. 


Watt, in setting different tasks (sets) for 
his subjects on the same stimulus words, 
observed directly the facilitative and in- 
hibitive effects of set, depending upon com- 
patibility of a specific set with an assumed 
a priori hierarchy of bond strength between 
word pairs. This concept of a universal, 
enduring, a priori hierarchy of associates of 
varying “strengths” as reflected by word- 
association commonality norms is widely 
held today, as indicated in a review of psy- 
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cholinguisties, by Rubenstein and Aborn 
(1960). 


...a number of studies concerned with the proba- 
bility of language segments and with word associa- 
tion have brought forth a point of view which 
stresses the significance of the concept of response 
hierarchy in interpreting the subject's performance 
in various verbal tasks. These studies have sug- 
gested, and to some extent supported, the following 
hypotheses: 

(a) Differential exposure to language segments 
(letters, words, ete.) produces in the individual a 
set of correlated probabilities of emitting those 
segments. 

(b) Since segments in natural languages are 
characterized by inequality in frequency of oc- 
currence, experience with language—both in send- 
ing and receiving messages—imparts to the indi- 
vidual an isomorphic response hierarchy. 

(c) Because members of the same linguistic 
community share a common language experience, 
their response hierarchies are similar [p. 291]. 


The Minnesota norms may be used to il- 
lustrate the common-response hierarchy re- 
ferred to above. The Minnesota commonal- 
ity tables report the associates of about 
1,000 students to the 100-word Kent-Rosan- 
off list. From these norms it would be deter- 
mined, for example, that to woman the word 
man has much greater association strength 
(given by 646 subjects) than has the word 
lady (given by only 15 subjects) and that 
to the word AFRAID, the word scared (240) 
has greater association strength than has the 
word brave (62) (Russell & Jenkins, 1954). 

As Watt discovered, and as indicated in 
Table 3 and in Moran et al. (1964), the as- 
sociates of a subject with a specific set fre- 
quently run counter to those predicted by 
such a common-response hierarchy. Unlike 
Watt’s temporarily induced sets, the idiody- 
namic sets postulated in the present study 
are enduring characteristics, traits, of indi- 
vidual subjects. Since the associations of 
the three types of set deviate'in a consist- 
ent, predictable manner from norms based 
upon the “generalized other” in the same 
linguistic community, it should be of inter- 
est to examine their deviate association 
hierarchies in relation to the general com- 
monality norms. To this end, the associates 
of the subjects used to represent sets in 
Table 3 were tallied separately, providing 
independent commonality norms for object- 
referent, concept-referent, and speed-set 
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subjects. The results are presented in Table 
4. 

On the left side of Table 4 are commonal- 
ity norms on 482 subjects, presented in the 
traditional manner. Here it may be seen, 
for example, that the association strength 
of prE-live is approximately twice that of 
prz-death, and that of pre-death about twice 
the association strength of prE-dead, etc. 

In the center of Table 4, the relative 
frequencies of “normative” primary, sec- 
ondary, and tertiary associates are given 
separately for each of the three set types. 
Note that not one of the object-referent set 
subjects gave the “primary” associate to 
DIE or STEEL; not one of the concept-referent 
set subjects gave the primary associate to 
HIT or YELLOW; and virtually no speed-set 
subjects gave the primary associate to HAM 
or FIDDLE. These are extreme examples, to 
be sure. Examples like those given in Table 
4 for pam and RADIO are much more com- 
mon, however, and illustrate the same 
point: “Commonality itself is a complex 
variable, a resultant score derived from 
consensus within different set types [Moran 
et al., 1964, p. 10].” The effects of averaging 
across frequencies in the three individual 
response hierarchies may be seen on the far 
right of Table 4. These averages across sets 
yield essentially the same overall hierarchy 
as that of the commonality norms on the 
left in Table 4. 

Later in this report, after an additional 
(perceptual-referent) set has been described 
and other aspects of associative sets ex- 
amined, the arbitrariness of commonality 
norms as a measure of general “association 
strength" will be even more evident. 


Iptopynamic Associative SETS or SPANISH- 
SPEAKING SUBJECTS IN COLLABORATION 
witH RAFAEL NÓSEZ, NATIONAL 
University or Mexico 


Studies in which commonality tables of 
English-speaking subjects have been com- 
pared with similar tables based upon associ- 
ates of foreign language subjects (e.g. 
Esper, 1918; Miron & Wolfe, 1964; Rosen- 
zweig, 1964), have all found a great simi- 
larity in response hierarchies. The present 
study was undertaken to determine whether 
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TABLE 4 
COMMONALITY AS AN AVERAGE OF INDIVIDUAL SET HIERARCHIES 
Commonality table (%) Set hierarchies (%) 
N = 482 N=4 N = 32 5 Average (75) 
Stim " acto zi N-53 of three sets 
DERE UE foe AE. ee. ed 

DIE 

Live 27 0 12 72 28 

Death 12 11 28 7 15 

Dead 6 0 12 0 4 
STEEL 

Iron 18 0 9 28 12 

Thief 8 21 6 9 12 

Rob 8 0 16 9 8 
HIT 

Hurt 32 32 0 15 16 

Strike 14 13 31 15 20 

Run 10 11 9 9 10 
YELLOW 

Green 14 6 0 36 14 

Blue 8 4 0 19 8 

Color 8 6 22 0 9 
HAM 

Pig 20 43 22 4 23 

Eggs 17 11 6 36 18 

Meat 7 0 28 0 9 
FIDDLE 

Music 25 62 38 7 36 

Violin 21 6 28 30 21 

Play 9 17 9 0 9 
PAIL 

White 23 4 41 11 19 

Sick 13 21 16 0 12 

Bucket 12 8 0 23 10 
RADIO 

Music 35 68 25 13 35 

TV 29 6 16 74 32 

Listen 6 8 9 0 6 


this interlinguistic similarity extended to 
idiodynamie sets. 


Procedure 

The subjects consisted of four groups of Span- 
ish-speaking students, 206 in all, at the National 
University of Mexico. The 80-word list was read 
in Spanish, in the same manner as in the preceding 
study, and responses written in Spanish were 
scored clerically from a prescored manual, based 
upon the associates of the 482 English-speaking 
sample, with commonality scored in the same man- 
ner, but based upon the 482 sample instead of the 
original sample of 196 men. 


Findings 

The means, standard deviations, and in- 
tercorrelations of the variables are provided 
in Table 5. Shown in Table 6 are the results 
of a normalized varimax rotation of Princi- 
pal Component factors. 


The same three factors found in earlier 
samples of English-speaking subjects ap- 
peared also in the Spanish-speaking sample. 
In Table 6, contrast-coordinate (Factor I), 
synonym-superordinate (Factor II), and 
functional (Factor III) clearly represent 
the three independent associative modes. 

To demonstrate idiodynamic sets, sub- 
jects were selected to represent a set if they 
had a standard score greater than .50 on the 
variables representative of a set and less 
than .50 on the variables representative of 
the other two sets, based upon their first 
44 associates only. These cutting scores, 
rather than 1.00 versus .00 as in the preced- 
ing study, were used because of the smaller 
sample and, of course, yielded groups with 
less pronounced sets. The prediction was 
made that groups evidencing a set would 
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TABLE 5 
Means, STANDARD DEVIATIONS, AND INTERCORRELATIONS FOR 206 Mexican STUDENTS 


Intercorrelations 


Variable Mean SD 1 z $ i $ z 
1. Functional 15.2 5.2 
2. Synonym 4.6 2.8 .04 
3. Superordinate 6.0 4.5 .06 .36 
4. Contrast 5.9 5.1 .01 —.09 —.17 
5. Coordinate 5.7 4.7 -19 m —.09 .64 
6. Faults 4.7 5.6 .32  —.10 10 —.25 —.19 
7. Commonality 4111.3 1615.4 .45 M .03 75 48 — .32 


achieve a higher commonality score on the 
12 criterion stimulus words in the second 
part of the lists that were most compatible 
with their set. 

Because the means on commonality for 
the three different groups of 12 words were 
very different (functional, 1,163; syno- 
nym-superordinate, 380; contrast-coordi- 
nate, 1,525), standard scores (based upon a 
total sample of 206) were reported in Table 
7. The reason for these mean differences is 
unclear; some of the differences may be 
attributable to difficulties in translation of 
both the stimulus and the response words. 
The important point illustrated by Table 7, 
however, is the indication of the operation 
of set. In every instance, the sample of sub- 
jects selected to represent a set achieved a 
higher commonality score on the criterion 
stimulus words most compatible with their 
set, as predicted. Clearly, the Mexican 
Spanish-speaking college students evidenced 
the same three idiodynamie associative sets 
observed in English-speaking subjects. 


TABLE 6 


Normatizep Variax ROTATED Factors FOR 
': 206 Mexican STUDENTS 


Factor 

Vari ————————————— 

ariable 1 i 8 ht 
1, Functional —.16 = 101 91 85 
2. Synonym .07 .88 B 70 
3. Super- 

ordinate —.10 82. —.10 69 
4. Contrast .90 z- 0 -16 87 
5. Coordinate .89 .04 —.12 81 
6. Faults —.24 .02 —.64 4T 
7. Common- 

ality ys 14 -56 85 


To this point, the same three idiodynamie 
sets have been demonstrated in a noncollege 
group of 35-year old normal men, an acutely 
psychotie group of schizophrenie patients 
(Moran et al., 1964), a group of college 
freshmen, and a group of Mexican college 
students. The “existence” of such sets seems 
to be well established in the most diverse 
groups of adults. 

The five variables used to measure sets — 
accounted for one-half to two-thirds of all 
associates to the present word list (Tables 
5 and 1). Other kinds of idiodynamic sets 
may account for additional unscored associ- 
ates. In fact, Jung's (1919) “predication ' 
type" is as yet unrepresented in the present 
system. 


TABLE 7 


EFFECT or ĪDIODYNAMIC SET UPON 
CowwoNaLiTY SCORE 


Type of stimulus word 
N Type of subject Functional süpemsdi- Gem 
(Average commonality 
s lard score) 
32 (F) Object-re- 
ferent .31 —.23 —.25 
34 . (SS) Concept- 
referent xl .52 —.55 
35 . (CC) Speed 
set .04 .02 1.10 


: Note.—Type of subject represented by subjects 
with z score greater than .50 on variables repre- j 
senting one set and less than .50 on variables 
representing the other two sets, based upon first 
44 associations. Type of stimulus word each repre- 
sented by 12 words that had elicited predomi- 
nately one type of response from 196 men; these 
were the last 36 stimulus words in the list. 
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-A PERCEPTUAL-REFERENT ASSOCIATIVE Ser 


One group among the normal subjects of 
-Jung's (1919) word-association experiments 
‘evidenced a very strong tendency to give 
noun responses to adjective stimulus words. 


This preference for the noun arises from the en- 
‘deavour of the predicate type to react chiefly by 
attributes. Our figures show, not merely that a 
predicate is thus reacted but also, inversely a noun 
is given to the adjective when this is the stimulus 
word [pp. 166-167]. 


Jung regarded the predicate type as a per- 
sonality trait, observing that 


the predicate attitude is not accidental but corre- 
sponds to a definite psychological disposition, 
which is maintained even when other kinds of 
reactions would be much easier than the predicate 
forms [p. 167]. 


Siipola has investigated the predicate 
type (a-noun) in relationship to a set which 
she considered to be its polar opposite, the 
set to give contrast associates (Dunn, Bliss, 
& Siipola, 1958; Siipola, Walker, & Kolb, 
1955). Siipola et al. (1955) attributed the 
two associative sets to differences in atti- 
tudes toward speed in responding. 


-..individuals adopt their own self-defined atti- 
tudes toward the speed at which they will operate, 
varying all the way from excessive concern with 
speed to feelings of almost complete freedom from 
time pressure. ... High production of contrast re- 
sponses and low production of a-noun responses 
are characteristic effects of an attitude of time 
pressure regardless of whether this attitude is 
experimentally imposed or is self-induced by the 
subject [p. 450]. 


Other investigators have reported a nega- 
tive correlation between predieation and 
Contrast associates, for example, —.45 
(Wells, 1912), —.55 (Kelley, 1913), —.69 
(Tendler, 1933). In their Table 4, Moran et 
al. (1964) reported for contrast associates 
correlations of —.60 and —.56 with "intrinsic 
predicate” (e.g. mED-apple) and “extrinsic 
predicate” (e.g., RoTTEN-apple) associates, 
respectively. For the logical coordinates, 
comparable correlations were —.52 and 
—.58, respectively. 

The tendency to give predominately 
predication associates seems to represent a 
fourth idiodynamie associative set, charac- 


terized also by slow reaction time and a 
marked tendency not to give contrast or 
coordinate associates. From the foregoing it 
should be predicted that a predication vari- 
able would have a large loading at the op- 
posite pole of the contrast-coordinate factor, 


Procedure 


For purposes of reliability studies, to be re- 
ported later in this paper, a sample of 327 fresh- 
men earlier had been administered the 80-word 
list in the manner described for the sample of 482 
freshmen. This sample of 327 was scored by the 
manual based on 196 men (Moran et al, 1964), 
since the manual based on the 482 freshmen had 
not yet been constructed, A predication variable 
was added to those already scored. i 

Predication associate. The stimulus word and 
response word are adjective-noun or noun-adjec- 
tive combinations; the stimulus word denotes an 
attribute of the object denoted by the response 
word, or vice versa, for example, nED-apple, or 
APPLE-red. 


Findings 


The means, standard deviations, and in- 
tercorrelations of the variables are provided 
in Table 8. Shown in Table 9 are the results 
of a normalized varimax rotation of princi- 
pal component factors. 

Factor I, in Table 9, represents the so- 
called "speed" factor, with contrast-co- 
ordinate associates and predication associ- 
ates at opposite poles, as predicted by 
Siipola. Factors II and IIT are characterized 
by the functional and by the synonym- 
superordinate associates, respectively, with 
coordinate loading —.57 on II, and super- 
ordinate loading only .59 on III. Although 
this factor structure was not as sharply 
defined as those previously reported, the 
same three factors are unmistakable, with 
predication associates clearly negative to 
contrast-coordinate, and orthogonal to the 
other two factors. 

Jung and Siipola both stressed the promi- 
nent role of imagery in predication associ- 
ates. “As we have already suggested, the 
individuals who belong to the predieate type 
have, we assume, primarily vivid inner pic- 
tures . . . [Jung, 1919, p. 160]"; the predica- 
tion types "give more adjective-noun as- 
sociates, and generally report complex 
processes, especially visual imagery, inter- 
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TABLE 8 
Means, STANDARD DEVIATIONS, AND INTERCORRELATIONS FOR 327 COLLEGE FRESHMEN 
Intercorrelations 

Variable Mean SD i ^ = 7 2 A = 
1. Predication 4.8 3.4 
2. Functional 17.3 5.2 .03 
3. Synonym 9.9 3.8 —.21 —.03 
4. Superordi- 

nate 4.9 2.6 —.24 —.02 .23 
5. Contrast 9.6 4.1 —.8  —.02 -00 13 
6. Coordinate 10.4 4.5 —.41 — .36 AT .03 42 
7. Faults 6.5 5.3 — .02 —.20 —.94 —.25 —.29 —.27 
8. Common- 

ality 1213.3 303.3 —.50 .35 E 37 .07 .33 —.44 


vening between stimulus and response 
[Dunn, Bliss, & Siipola, 1958, p. 76]." The 
associates of predication-type subjects in 
the present study (20 subjects who were 
high on the predication end of Factor I and 
low on the other two factors) were over- 
whelmingly perceptual referent. Even on the 
last 36 stimulus words, which typically 
evoke one predominate associate, these sub- 
jects seldom gave the "typical" associate. 
The stimulus word naw had elicited from 
52% of the normative group the object- 
referent associate, hammer. The present 
20 predication-type subjects gave instead 
a variety of associates such as: broken, 
board, hard, steel, long, finger, window, 
cross, etc. Instead of giving the superor- 
dinate "bird" to crow (as did 50% of the 
normative group), half of these subjects 
gave black. Instead of the popular response 
"sweet" to sour, predication-type subjects 
gave pickle (2), milk (2), lemon (3), 


TABLE 9 


Normauizep Varimax ROTATED FACTORS FOR 
327 COLLEGE FRESHMEN 


E Factor 

Variable = fit Li 
1. Predication 78 13 — .08 62 
2. Functional —.04 94 .08 88 
3. Synonym .00 —.16 -80 67 
4. Super- 

ordinate —.12 .00 .99 37 
5. Contrast —.89 .05 .06 79 
6. Coordinate —.59 —.57 18 70 
7. Faults 16. —.17 —.71 56 
8. Common- 

ality —.72 .32 .52 88 


orange, etc. Whereas the majority of norma- 
tive subjeets gave white to BLACK, these 
subjects gave associates like night (3), 
Negro, dress, Zorro, suit, smoke, ete. 

The “perceptual” quality of the associates 
is readily apparent. To stimulus words that 
denote objects, these subjects associated a 
perceptible (usually visual) attribute; to 
stimulus words that denote an attribute, 
these subjects associated an object which 
might display the attribute. Although one 
cannot predict from knowledge of a predica- 
tion set the highly probable specific re- 
sponse word to a specific stimulus word (as 
often can be done for object-referent, con- 
cept-referent, and speed-set subjects), one 
can predict that the stimulus-response word 
pair will be descriptive of some object-at- 
tribute relationship. This consistent associa- 
tive tendency might be termed a perceptual- 
referent set. 

In addition to adjective-noun, noun-ad- 
jective associates and a tendency to report 
visual imagery, the perceptual-referent set 
subjects are characterized by very low com- 
monality scores. Jung (1919) commented on 
the large number of “egocentric” associates 
given by such subjects. Wells (1912) reports 
a correlation of —.74 and Tendler (1933) 
one of —.79, between commonality and 
predication associates; the comparable cor- 
relation in the present study was —.50 
(Table 8). The reason for this typically low 
commonality score is readily apparent from 
the scattered variety of object-attribute as- 
sociates, described above, given by per- 
ceptual-referent set subjects, 

To characterize the tendency to give 
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predicate associates as an idiodynamic per- 
ceptual-referent set implies an enduring 
personality trait. Jung (1919) conceptual- 
ized the predication-type in this manner: 


From the figures of the distraction experiment it 
can be stated that the predicate type is no merely 
accidental momentary attitude, but corresponds to 
an important psychological characteristic—one 
which is maintained amid altered conditions [pp. 
162-163]. 

Wells (1912) also has reported a correlation 
of .83 for predication associates, in subjects 
retested after a 14-month interval. Reli- 
ability is a critical issue for dispositional 
constructs, such as the postulated idiody- 
namic associative sets, and will be examined 
next. 


RELIABILITY STUDIES 


The present scoring system consists of 

six variables, used to represent four postu- 
lated idiodynamic associative sets, plus a 
fault and a commonality variable. Such a 
system categorized about 80% of all associa- 
tions (Table 8). In passing, it is interesting 
to note that Woodworth (1948) long ago 
intuitively derived a scoring system much 
like this one. Woodworth remarked, 
There does seem to be some psychological basis 
for the fourfold classification suggested; but it is 
doubtful whether any such scheme can win general 
acceptance at the present time [p. 353]. 


An examination of the reliability of this 
system follows. 


Factor Invartance: REPLICATION STUDIES 


The three orthogonal factors, represented 
by synonym-superordinate, contrast-coordi- 


TABLE 11 


NORMALIZED Varmmax ROTATED FACTORS FOR 
353 CoLLEGE FRESHMEN 


Factor 


Variable — n 
I I III 

1. Predication -68 .06 —.21 53 
2. Functional .24 .90 —.01 87 
3. Synonym —.14 —.07 84 73 
4. Super- 

ordinate 02 +20 76 62 
5. Contrast —.89 .10 —.14 82 
6. Coordinate —.86 =. .00 81 
7. Faults .4 -—.50 —-.21 49 
8. Common- 

ality —.69 .48 .33 82 


nate, and the functional variable, appear to 
be exceptionally stable with different word 
lists (Moran et al., 1964) and across quite 
different subject populations. Introduction 
of a predieation variable in the preceding 
study yielded a bipole predication versus 
contrast-coordinate factor. 

In order to determine the replicability of 
this “new” factor structure, with the addi- 
tional predication variable, a different sam- 
ple of 353 freshmen was tested with the 80- 
word list. 'The procedures followed with the 
new sample of 353 freshmen, except for use 
of the scoring manual based upon 482 fresh- 
men (Table 1) instead of the 196 men, were 
identieal to that of the preceding study. 
Means, standard deviations, and intercorre- 
lations are provided in Table 10. Table 11 
shows the results of a normalized varimax 
rotation of principal component factors. 

Comparison of the factor loadings in Ta- 
bles 9 and 11 reveals essentially the same 


TABLE 10 
Means, STANDARD DEVIATIONS, AND INTERCORRELATIONS FOR 353 COLLEGE FRESHMEN 


Intercorrelations 


Variable 


1 2 3 4 5 6 7 

1. Predication 5.3 3.2 
2. Functional 17.0 5.1 .05 
3. Synonym 9.8 3.9 —.23 —.08 
4. Superordi- 

nate 5.3 2.8 —.21 14 31 
5. Contrast 8.3 3.9 — .53 —.15 —.62 —.02 
6. Coordinate 9.7 4.6 —.50 —.42 .13 —.07 .61 
7. Faults 6.1 4.9 -07 —:19 —.28 —.16 —.32 —.33 
8. Common- 

ality 8065.7 1935.1 —.45 .26 -38 -29 .62 44 — 46 
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TABLE 12 
NORMALIZED Vartmax ROTATED FACTORS FOR 
206 Mexican COLLEGE STUDENTS 


^ Factor H 

Variable 1 " T 
1. Predication -67 E —.42 72 
2, Functional 13 84 07 73 
3. Synonym — 04 .16 prt 62 
4. Super- 

ordinate -05 —.13 -83 71 
5. Contrast —.89 .20 —.19 87 
6. Coordinate —.86 —.05 —.02 74 
7. Faults 15 —.72 .08 54 
8. Common- 

ality =l 57 Bt 85 


factors in both analyses, with coordinate this 
time having a lower negative loading (—.27) 
on the “functional” factor in Table 11. 
These results on a different sample of 353 
subjects indicate the replicability of factor 
structure, with the inclusion of the new 
predication variable. 

This replicability is further demonstrated 
in a reanalysis of the associates of the 206 
Spanish-speaking subjects, first shown in 
Table 6. For the present reanalysis, associ- 
ates were scored also for predication re- 
sponses, and refactored, with results as 
shown in Table 12. Even when the word- 
association experiment was conducted en- 
tirely in Spanish, the three factors appeared, 
with the predication variable again princi- 
pally on the opposite end of the contrast- 
coordinate factor. 


Factor Invariance: MEN versus WOMEN 


The 327 freshman sample (Tables 8 and 
9) was divided into two samples of 173 
men and 154 women, and each sample was 
factor analyzed separately. Two sets of fac- 
tor scores were calculated for the men; one 
set of factor scores was based upon beta 
weights derived from the factor analysis on 
the men, but the other set of factor scores, 
for the same men, was based upon the beta 
weights derived from the independent factor 
analysis on the women. Correlations be- 
tween the two sets of factor scores were: I. 
perceptual-referent and speed-set factor, .94; 
IL. object-referent set factor, .98; III. con- 
cept-referent set factor, .98. These correla- 
tion coefficients are of the same order as the 
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correlations obtained when the same sample 
is tested on two occasions, as described be- 
low, Clearly, sex of the subjects did not af- 
fect factor structure. 


Factor Invariance: TEsr-RETEST 


Three months after the first testing, the 
same word list was administered, in reverse 
order, to 195 of the 372 freshman sample. 
Results of first testing and second testing 
were factored separately for the 195 sub- 
jects. Two sets of factor scores were calcu- 
lated for the first test performance only. 
One set of factor scores on the first perform- 
ance was based upon the beta weights de- 
rived from a factor analysis of the first per- 
formance. Another set of factor scores on the 
same first performance was based upon the 
beta weights derived from the factor analy- 
sis of the later, second performance, Corre- 
lations between the two sets of factor scores 
were: I. perceptual-referent and speed-set 
factor, .97; II. object-referent set factor, 
.96; III. concept-referent set factor, .99. 
Thus, the factor structures were so similar 
on the two occasions, 90 days apart, that it 
actually made little difference which struc- 
ture was used to calculate factor scores. 

Interrelationships among the eight varia- 
bles discussed so far appear to be best repre- 
sented, at least for adult subjects, by the 
three orthogonal factors described above. 
Factor structure, per se, seems to be highly 
reliable, The next step, however, involves an 
attempt to better understand specific as- 
sociative tendencies of individual subjects, 
guided by the score patterns on the three 
factors. It thus becomes important to deter- 
mine the reliability of subjects, with respect 
to scores on these factors. 


Reliability of Factor Scores 


To determine the split-half reliability of 
factor scores, associates to List A and to 
List B were factored separately for the 
sample of 327 freshmen. Factor scores de- 
rived from the two different factor struc- 
tures were then correlated, and corrected 
by the Spearman-Brown formula for an 80- 
word list. The two sets of factor scores on I. 
perceptual-referent and speed set correlated 
.84; on II. object-referent set, .69; and III. 
concept-referent set, .83. Thus, subjects 
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TABLE 13 
ROTATED ORTHOGONAL Factor Loapines or Factor Scores ror 195 SUBJECTS 
Test session Variable Factor I ie TI k 
First 1 I (Perceptual-speed) .94 —.01 = 01 89 
First 2 II (Object-referent) .03 .92 .03 84 
First 3 III (Concept-referent) —.01 —.05 .89 80 
(90-Day interval) 
Second 4 I (Perceptual-speed) .94 .08 —.05 88 
Second 5 II (Object-referent) —.01 .92 —.02 85 
Second 6 III (Concept-referent) —.05 .06 .89 79 


tended to maintain relative rank positions 
on the three dimensions represented by the 
three factors, on both word lists. 

Three months after the first testing ses- 
sion, 195 of the 327 sample had been re- 
tested. These data were used above in the 
examination of factor structure. In the 
present analysis, factor scores were calcu- 
lated separately from factor analyses of 
each separate testing, to determine whether 
subjects maintained relative rank position 
on two different occasions, that is, test- 
retest reliability of subjects on the three 
faetors, The six factor scores, three from 
each testing, were treated as variables and 
factor analyzed jointly, with results as 
shown in Table 13. It may be seen that the 
three factors were almost perfectly orthog- 
onal on both occasions. Since individual 
subject factor scores were employed, the 
similarity of factor loadings on both oc- 
casions reflects directly the test-retest con- 
sistency of the subjects. Correlations be- 
tween factor scores derived separately from 
first and second test factor analyses were: 
I. perceptual-referent and speed set, .75; II. 


object-referent set, .65; ITI. concept-referent 
set, .58. 


Reliability of Individual Variables 


A comparison of performances on Lists 
A and B, based upon the 327 sample, is 
provided in Table 14. Means and standard 
deviations on the eight variables were 
roughly comparable for the two lists, Con- 
sidering the low mean frequency of some 
associates, the split-half reliability coeffi- 
cients for the eight variables indicate rea- 
sonable consistency of subjects on the two 
lists. 

When the same word list was readminis- 
tered (in reverse sequence) to a sample of 
195 subjects after a 90-day interval the 
only appreciable mean difference was a 
lower number of response faults and a 
slightly higher commonality score, on sec- 
ond testing, as shown in Table 15, A general, 
but very slight, drop in standard deviations 
on second testing also may be seen in Table 
15. The test-retest correlations on faults was 
low (.49) and for superordinate, very low 
(.36). Otherwise, the test-retest correlations 


TABLE 14 


Mrans, STANDARD DEVIATIONS, AND INTERCORRELATION OF List A AND List B, FOR 
327 COLLEGE FRESHMEN 


List A List B 

Variable Mas SP Maa 3p rit 
1. Predication 2.3 1:0 2.5 1.9 de 
2. Functional 8.9 3.0 8.3 3.1 61 
3. Synonym 5.3 2.2 4.6 2:1 .67 
4. Superordinate 2.2 1.6 2.7 1.6 -56 
5. Contrast 4.7 2.2 4.9 2.4 .79 
6. Coordinate 5.3 2.5 5.0 2.6 .78 
7. Faults 3.0 2.6 3.5 3.1 .82 
8. Commonality 627.8 165.8 585.5 162.1 83 


Note.—r: = Pearson product-moment correlation coefficient, corrected by Spearman-Brown formula 


for total test, 80 words. 
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TABLE 15 


Means, STANDARD DEVIATIONS, AND INTERCORRELATION OF 90-Dav TrsT-RETEST SCORES 
or 195 COLLEGE FRESHMEN 


1st Test 2nd Test ) 
dae Mean SD Mean SD 
an 

1. Predication 4.9 3.5 4.2 3.3 .68 
2. Functional 17.4 5.2 16.7 4.7 .66 
3. Synonym 10.0 3.8 10.1 3.4 62 
4. Superordinate 4.8 2.7 5.2 2.3 .36 
5. Contrast 10.0 3.9 10.3 3.8 71 
6. Coordinate 10.5 4.6 11.4 4.3 -63 
7. Faults 6.1 4.9 4.0 3.7 .49 
8. Commonality 1233.3 263.9 1293.2 266.1 .70 


of individual variables, shown in Table 15, 
indieate moderate consistency in perform- 
ances 90 days apart, the coefficients ranging 
from .62 to .71. 

In the studies that have been described 
herein, and in Moran et al. (1964), replica- 
bility of the idiodynamie associative set 
phenomena seems to have been reasonably 
well established. By following the same, or 
similar, “operations” as outlined in these 
studies, other investigators ought to be able 
to reproduce these set phenomena at will. 
In what follows, task attitudes (Siipola et 
al., 1955) are investigated in relation to the 
individual differences in linguistic habits 
which give rise to idiodynamie associative 
sets. 


Task ATTITUDES AND IpiopyNamic SETS 


Siipola has demonstrated what appears to 
be a reciprocal relationship between predi- 
cation and contrast associates, varying as a 
function of the subject’s attitude toward 
speed in responding. For example, she has 
shown that when subjects were placed under 
greater time pressure the number of contrast 
associates increased markedly (from 174 to 
442) and the number of adjective-noun as- 
sociates decreased (from 705 to 392) just 
as markedly (Siipola et al., 1955). The bi- 
pole Factor I in Tables 9, 11, and 12, with 
predication at one end and contrast-coordi- 
nate at the other, is compatible with Sii- 
pola’s view. It need not follow from these 
findings, however, that perceptual-referent- 
set subjects shift to give contrast associates 
under time pressure, or that speed-set sub- 
jects shift to give predication associates 


when time pressure is lifted. It will be re- 
called that Siipola's word list was “loaded” 
with popular contrast-evoking stimulus 
words. As indicated earlier, it may be that 
time pressure moved people to give more 
popular associates and that the “category” 
into which such associates fall is deter- 
mined by the experimenter's selection of 
stimulus words. Also, even if time pressure 
did induce people in general to shift “cate- 
gory" of associate, those people with strong 
idiodynamie sets might not do so. In fact, 
Jung (1919) found his predicate-type sub- 
jects to be practically impervious to various 
forms of distraction. 


For the sake of comparison we have placed by the 
side of the predicate type the average of all other 
types. The difference is striking. In distraction the 
predicate type shows no change worth mentioning. 
The predicate type does not dissociate its atten- 
tion, whilst all other types prove themselves, to 
some extent at least, accessible to the disturbing 
stimulus. This fact is extremely remarkable [p. 160]. 


Jung (1919) stressed the prominent role of 
imagery in predicate-type subjects. 


The predicate type is unable to dissociate his at- 
tention because his primary vivid inner pictures 
make such a demand upon his attention that 
inferior associations (which make up the distrac- 
tion phenomenon) cannot arise [p. 162]. 


Siipola also observed that such subjects 
"generally report complex processes, espe- 
cially visual imagery, intervening between 
“wae and response [Dunn et al., 1958, p. 

The present experiment was designed to. 
examine the effects of time pressure upo 
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idiodynamie associative sets and upon im- 
agery. 


Procedure 


Two months after the first testing, 240 of the 
353 sample (Tables 10 and 11) were retested in the 
following manner. Subjects were instructed, as 
before, to write down the first word that came to 
mind when they heard the stimulus word. They 
were told that the words would be read in rapid 
order, and they would have to write quickly. If 
no word came to mind, they were to leave the 
space blank. The 40 words in List A were then 
read at 4-second intervals. The subjects then were 
told that stimulus words would be read very 
slowly and that they would have a great deal of 
time to write down the word that came to mind 
when they heard the stimulus word. If no word 
came to mind they were to leave the space blank. 
The 40 words in List B were then read at 8-second 
intervals. After both lists had been administered, 
the subjects were asked to review their first 40 
associates and to estimate the number of times an 
associate had been accompanied by an image (a 
mental picture), Several minutes were allowed for 
the review, then subjects were asked to record the 
number of images at the top of the page. The 
same procedure was followed to obtain number of 
images on the second 40 associates. 

Later, a comparable freshman class of 246 stu- 
dents, taught by the same instructor, was adminis- 
tered the two-word lists by the same examiner. 
Procedure was identical to that just described, ex- 
cept that List B was read first at 4-second inter- 
vals, followed by list A at 8-second intervals. To 
equate the two samples, 6 subjects were discarded 
from the 246 sample. 

In the sample of 353, tested 2 months earlier, 
subjects had been asked to record the number of 
images experienced on the 80-word list. These 


data also were brought to bear in the present 
analysis. 


Findings on the Total Sample 


Overall mean differences under the 4-sec- 
ond and 8-second condition are shown in 
Table 16. Also provided are the means of a 
sample of 353, which had associated to both 
List A and List B at 5-second intervals, 
This comparison sample is useful in differ- 
entiating initial list differences from differ- 
ences due to testing condition. The statisti- 
cal significance values reported, however, 
are based upon the total 480 sample, for 
whom the lists were counterbalanced. 

Mean differences under the two testing 
conditions were generally very slight, as 
may be seen in Table 16, but the results 
corroborate the findings of Siipola et al. 
(1955). The frequency of contrast-coordi- 
nate associates was significantly higher, and . 
the incidence of predicate associates and 
reported imagery was significantly lower 
under the 4-second, time-pressure condition. 
A slight increase in commonality and de- 
crease in synonym associates under time 
pressure also were statistically significant. 

The most impressive effect of time pres- 
sure was that upon reported imagery. Of 
the 194 in the first sample who reported a 
difference in incidence of imagery under the 
two conditions, 146 reported a decrease un- 
der the 4-second condition; of the compara- 
ble 212 in the second sample (with word 


TABLE 16 
MEAN CHANGES UNDER 4- AND 8-SECOND CONDITIONS 
Cu ee SAPE SS cium oa 
F: " b 
LitB ListA gits = ListB — L&tA = List A  ListB Ds UU : 
5 seconds 5 seconds 5'Seconds — seconds 4seconds ^ 8 seconds 4 seconds FSI SD 

Predication 2.7 2.6 5.3 3.1 2.1 2.4 2.2 5.4 4.3 .01 
Funetional 8.3 8.7 17.0 8.6 8.4 8.2 7.9 16.2 16.3 n.s. 
“Synonym 5.0 4.8 9.8 5.0 4.8 5.3 4.7 10.2 9.5 02 
Superordinate 2.8 2.5 5.3 3.1 2.6 2:7 3.2 5.8 5.8 n.s. 
Contrast 4.3 3.9 8.2 4.1 4.0 4.6 5.2 8.7 9.1 .05 
Coordinate 4.4 5.2 9.6 4.8 5.4 5.2 5.4 10.0 10.8 .01 
Faults 3.2 2.9 6.1 2.3 2.7 1.1 1.4 3.4 4.1 .01 
Images 36.8 20.2 16.5 21.8 18.7 42.0 35.2 .01 
Commonality 3657 4409 8066 3623 4622 4348 3954 7971 8576 .02 


* See Table 10. 


^ Significance level determined by two-cell chi-square, comparing observed frequency of increased 
versus decreased scores under 4- and 8-second condition, with expected frequency of 50-50 (96); signifi- 
cance levels are for two-tailed test, N — 480. 
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lists reversed), 164 reported a decrease un- 
der the 4-second condition. The incidence 
of reported imagery on List A correlated 
.71 with that reported on List B. Two- 
month test-retest correlations between re- 
ported imagery on the 80-word list read at 
5-second intervals and that on the 40-word 
lists read at 4-second and 8-second intervals 
(first 240 sample) were .43 and .38, respec- 
tively. 

The very dramatic shift from predicate to 
contrast associates under time pressure, as 
reported in the Siipola et al. (1955) study, 
was much less dramatic when repeated 
with word lists that were not loaded with 
contrast-evoking stimulus words. Neverthe- 
less, the associative shifts of people in gen- 
eral were as predicted by Siipola. It will be 
of interest now to examine the effects of 
time pressure upon idiodynamic associative 
sets. 


Findings on Idiodynamic Sets 


Results on the two 240 samples were fac- 
tored separately for the 4-second and 8- 
second condition, yielding four independent 
factor analyses, There were no differences 
in factor structure attributable to differ- 
ences in testing conditions; the same three 
factors appeared in all analyses. To deter- 
mine effects of time pressure, subjects were 
selected to represent a set by their factor 
scores on the 8-second condition and then 


LOUIS J. 


MORAN 


the factor scores on the 4-second condition 
were examined. Results on the two 240 
samples are shown separately in Table 17, 
to provide a replication with word lists 
reversed. 

Perceptual-referent set. The 36 subjects 
in the first 240-subject sample who had high 
scores (>.50) on Factor I and low scores 
(«.30) on Factors II and III in the 8- 
second condition, showed virtually no 
change in faetor scores based upon the 4- 
second condition (Table 17). The stability 
of the set to give predicate associates was, 
in the words of Jung, “extremely remark- 
able.” Results on the 30 subjects in the 
second 240-subject sample, with lists re- 
versed, were somewhat less consistent. De- 
spite the tendency of this group to shift to 
more "functional" associates (.30), how- 
ever, the set to give predicate associates 
remained clearly predominate (.96). 

Examination of score changes for the 
total 65 perceptual-referent set subjects on 
the individual variables showed, under the 
4-second condition, a decrease in predicate 
associates and reported imagery, with a 
concomitant increase in associates of the 
other “set” categories, and in commonality. 
As shortly will be seen, these changes are 
typical of the other idiodynamic sets as 
well: a decrease in set-representative as- 
sociates, with an increase in popular associ- 
ates of the other categories. It would appear 


TABLE 17 
Factor SCORE CHANGES UNDER 4- AND 8-SECOND ConpiTions 
Factor scores of subjects with set 
y 
E AMET: ^ rom first sample of 240 From second sample of 240 
List B, 8 seconds List A, 4 seconds. | N List A, 8 seconds List B, 4 seconds 
T u In I I Ht I I m I II m 
I Predication (+) 36 | 1.19/—.65|—.68| 1.03|— .60|— — — 
PUE .98| 30 | 1.29/—.72|—.79| .96| .30|—.33 
nate (—) 28 |-1.28|—.69|—.70—.82|—.60| .03| 35 —1.06|— .59|— .52 2 
J z : T -06)— .59)— .52/— .56/—.43) .25 
II Functional 12 -02| 1.01/— .64!— .08|— .04. 1 - 
gemi. -13| 13 .01| 1.32|—.49|—.24| .09|—.04 
ordinate 15 | —.02|—.49) 1.04 .17|—.03| .14| 17 —.01|—.33| 1.16—.23| .04| .37 
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from the evidence in Table 17 that the 
perceptual-referent set was the least dis- 
turbed by time-pressure instructions. 

Speed set. The 28 subjects chosen from 
the first 240 sample to represent the speed 
set (contrast-coordinate) under the 8-sec- 
ond condition, maintained the same set un- 
der the 4-second condition (Table 17). The 
decrease in set-representative associates 
(from —1.28 to —.82) was larger than that 
found with perceptual-referent set, subjects, 
and a sharp increase in synonym-super- 
ordinate associates (from —.70 to .03) was 
notable. But factor scores under the 4- 
second condition remained well within the 
limits specified for speed-set subjects. Re- 
sults on 35 subjects, chosen in the same 
manner from the second sample of 240, 
were about the same, with a somewhat 
larger decrease in set-representative associ- 
ates (from —1.06 to —.56) and an increase 
in synonym-superordinate associates (from 
—.52 to .25). Overall, it may be said that 
speed set was “weaker,” but still much in 
evidence under the 4-second condition. 

Inspection of score changes on the indi- 
vidual variables for the total 63 speed-set 
subjects revealed the same pattern of shift- 
ing as that noted in the perceptual-referent 
set subjects. In the 4-second condition, the 
frequency of set-representative contrast- 
coordinate associates dropped from 14.2 to 
11.1, frequency of associates in the other 
categories increased (except predication), 
with an increase in commonality, and a 
decrease in reported imagery. 

The speed set was so called mainly be- 
cause of evidence that when people in gen- 
eral were placed under time pressure in the 
word-association experiment, the frequency 
of contrast-coordinate associates markedly 
increased (Siipola et al., 1955). This phe- 
nomenon was demonstrated also, to a lesser 
degree, in Table 16. Contrast-coordinate as- 
sociates also tend to have fast reaction time 
(Moran et al., 1964). On this basis, it was 
inferred that some subjects had their own 
enduring, built-in set for speed and, for 
this reason, consistently behaved as people 
in general do when under explicit time-pres- 
sure instructions: that is, they produce 
many contrast-coordinate associates. 


The data presented above are incompati- 
ble with such a rationale. The linguistic 
habit which underlies the set to give con- 
trast-coordinate associates was disrupted, 
rather than facilitated, by explicit time- 
pressure instructions. It does not seem likely 
that these subjects initially were set for 
speed in responding, and therefore tended, 
only incidentally, to give many fast re- 
action-time contrast-coordinate associates 
(Moran et al., 1964). Rather, it would ap- 
pear that this idiodynamie associative set 
promotes, specifically, contrast-coordinate 
associates. The enduring attitude which 
might best account for this set is more 
likely one toward words in general, rather 
than one involved with rate of responding 
in the word-association experiment. Since 
both contrasts (bipole) and coordinates are 
dimensional terms, this set might be termed 
a dimension-referent set. The aptness of 
this name may become more apparent as 
data on this set accumulate later in this 
report. 

Object-referent and concept-referent sets. 
The 12 subjects in the first 240-subject 
sample who were chosen by 8-second condi- 
tion factor scores to represent the object- 
referent set showed no set at all in the 
4-second condition. This finding was re- 
peated with 13 subjects from the second 
240-subject sample. Essentially the same 
result was obtained with the 32 concept- 
referent subjects, although some vestige of 
set remained (set factor score of ,37) in the 
4-second condition, with subjects drawn 
from the second 240-subject sample (Table 
17). 

Time pressure seemed to have the effect 
of dissolving object-referent and concept- 
referent associative sets. For example, un- 
der the 4-second condition concept-referent— 
set subjects gave about the same number of 
“functional” associates as did the object- 
referent subjects, the two set types gave 
superordinate associates with equal fre- 
quency, and both set types gave even more 
coordinate associates than the dimension- 
referent-set subjects. Considering the 90- 
day test-retest reliability of object-referent 
(.65) and concept-referent (.58) sets under 
the 5-second condition, the effect of time 
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pressure, as shown in Table 17, was quite 
marked. At present, little may be added to 
this descriptive account. 

Although the effects of time pressure upon 
different idiodynamie sets varied greatly in 
degree, all four sets evidenced the same 
general change pattern under the 4-second 
condition: (a) a decrease in frequency of 
the category of associates representative of 
the set, (b) an increase in frequency of as- 
sociates representative of the other sets, 
except predication, (c) an increase in com- 
monality, (d) a decrease in frequency of 
predicate associates, and (e) a decrease in 
incidence of reported imagery. 

The 186 subjects who participated in the 
above analysis of effects of speed instruc- 
tions upon sets have been treated, to this 
point, as the 39% of “deviate” cases in the 
sample of 480. This treatment was, of 
course, quite arbitrary. Any number of sub- 
jects could have been included in the “de- 
viate” or “nondeviate” samples simply by 
changing cutting scores. But the purpose of 
this series of studies has been to determine 
the generality, reliability, and characteris- 
tie features of specific idiodynamie sets, 
and the selection of “extreme” cases by 
means of arbitrary cutting scores has helped 
to serve this purpose. Findings on these sub- 
jects may now be generalized to the other 
subjects. 


Bases For Marcuinc Worp Pains 


The high reliability of three orthogonal 
word-association factor scores on 4 succes- 
sive days, with different stimulus words 
each day, led to the postulation of three 
idiodynamic associative sets (Moran et al., 
1964). Obviously, persons consistently high 
on only one of the three factors had to be 
evidencing a strong tendency, or set, to give 
predominately those categories of associate 
which loaded highest on that factor. Subse- 
quent studies were centered upon only those 
persons who evidenced a strong set. But the 
three-factor structure, which has appeared 
with great fidelity in quite different kinds of 
subject populations, represents the orthog- 
onal dimensions which most parsimoniously 
account for variabilities in the multiple 
measures taken upon the total sample. 


Factors as Generic Dimensions 


To illustrate, what would happen to the 
three-factor structure if all subjects with 
sets (as defined in preceding studies) were 
removed from the sample, and results on the 
residual subjects factored? The results of 
such an analysis may be seen in Table 18. 
The 144 subjects with idiodynamic sets were 
removed from the 353 sample (Table 10), 
and the results of the residual 209 subjects 
were factor analyzed. 

As indicated by the results in Table 18, 
removal of set “types” had little effect upon 
factor structure. Using the same cutting 
scores, all set types were then removed from 
the sample of 209 and the residual factor 
analyzed, again yielding the same three 
factors. This process was repeated until 223 
subjects identified as types had been re- 
moved, and the residual sample of 130 was 
factor analyzed. The end product of this 
iterative process is presented in Table 19. 

As shown in Table 19, removal of the de- 
viate subjects did not have an appreciable 
effect upon factor structure. An analogy 
may be made here to the Thurstone (1957) 
“box problem.” Multiple linear measure- 
ments taken on a sample of boxes will yield 


TABLE 18 
NORMALIZED Varimax ROTATED FACTORS OF 
353 SAMPLE, AFTER REMOVAL OF 144 
Sussects wiTH Sers, N = 209 


Factor 


Variable h 
I I I 

1. Predication — .68 -10 —.15 49 
2. Functional 18 .88 —.08 82 
3. Synonym —.20 —.20 .82 76 
4. Super- 

ordinate .10 +82 -70 60 
5. Contrast —.89 .15 =—11 82 
6. Coordinate —.84 —.26 .02 78 
7. Faults 36 4) = .563 120, 50 
8. Common- 

ality T .42 E 84 


Note.—Subjects with sets were selected as 
follows: Perceptual-referent set—Factor I score 
2.50, II and ITI <.30; dimension-referent set— 
Factor I score <—.50, IT and III « 30; object- 
referent set—Factor II score >.50, III <.30, I 
between .30 and —.30; concept-referent set— 


m II score >.50, IT <.30, I between .30 and 
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TABLE 19 
NonMaLIZED Varimax ROTATED Factors OF 
353 SAMPLE AFTER REMOYAL Or 223 
Sussects wiTH Sers, N = 130 


Factor 
Variable — h 
I Ir Ir 

1. Predication .67 B —.24 52 
2. Functional .26 .82 —.14 76 
3. Synonym —.19 —.16 .82 74 
4. Super- 

ordinate 14 .33 74 67 
5. Contrast, — .86 18 —.17 80 
6. Coordinate —.85 —.19 —.03 76 
7. Faults .21 —.67 17 53 
8. Common- 

ality —.71 .50 .25 82 


Note.—Subjects with “sets” were selected as 
follows: Perceptual-referent set—Factor I score 
2.50, II and III <.30; dimension-referent set— 
Factor score on I < —.50, II and III <.30; object- 
referent set—Factor II score 2.50, III <.30, I 
between .30 and —.30; concept-referent set— 
Factor III score 7.50, II <.30, I between .30 and 
—.30. 


a three-factor solution: I, length; II, width; 
and III, height. If every box characterized 
by arbitrary cutting points as high on one 
factor but low on the other two factors was 
removed, a factor analysis of the “residual” 
sample of boxes would still yield the same 
three factors. This process could be repeated 
indefinitely. Analogously, the three word- 
association factors represent more generic 
dimensions of a domain partially tapped in 
a variety of different ways by the multiple 
individual variables employed in the pre- 
ceding experiments. 


Factors Interpreted as Bases for Matching 
Word Pairs 


The three factors, or dimensions, may be 
interpreted as representing basic ways in 
which one word may be matched with an- 
other word. One argument against such a 
simplified account as this might be antici- 
pated. Many very elaborate categorization 
systems for word associations have been 
published which demonstrate the seemingly 
infinite number of ways in which word pairs 
may be matched. Murphy (1917) for ex- 
ample, published an 87-category system for 
sorting word pairs in the free association 
experiment, with the comment: “I shall 


first give it as it stands entire, and then 
explain it, in so far as a thing so fearfully 
and wonderfully made can be explained [p. 
248]." While numberless principles for pair- 
ing words may be conceived, however, the 
empirically derived factors in the preceding 
studies suggest a parsimonious set of word- 
matching principles to which most people 
seem to conform. 

1. Words may be matched on the basis of 
object-attribute relationships. The experi- 
menter names an attribute (e.g., BLUE) and. 
the subject names an object (e.g., water), 
or vice versa (e.g., WATER, blue). Since the 
completed stimulus-response word pair al- 
ways refers to a perceptible quality of a 
concrete thing, whether the experimenter 
names an attribute or names an object, 
these might be called perceptual-referent as- 
sociates. 

2. Words may be matched on the basis of 
the concrete things they name. The experi- 
menter names a thing, and the subject 
names another thing associated with it, for 
example, SPIDER, web; roor, shoe; BOAT, 
dock. These might be termed object-referent 
associates: that is, the stimulus word is 
referred to the concrete object which it 
labels, and the response word is the label 
of another concrete object. 

3. Words may be matched lexigraphi- 
cally. The experimenter give a word and the 
subject “defines” it. The subject plays “dic- 
tionary,” and gives a synonym or super- 
ordinate of the stimulus word: (stimulus 
word) implied “is” or “is a" (response 
word). Because the concrete things or actual 
events denoted are irrelevant to the “logi- 
cal” basis of word matching used and since 
one concept is referred to another concept, 
these might be called concept-referent as- 
sociates. 

4. Words may be matched on the basis of 
a common dimension. The experimenter 
names one pole of a continuum (e.g., FAST), 
and the subject names the other pole (e.g., 
slow) ; or, the experimenter names one en- 
tity (e.g., BLUE), and the subject names 
another entity of the same logical order 
(e.g., green, or yellow, or pink, etc.). Since 
the stimulus-response word pair specifies 
or refers to a dimension, these might be 
called dimension-referent associates. 
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“Tn all of the samples to which the present 
scoring system has been applied, the above 
word-matching bases have appeared in 
three orthogonal factors, with 1 and 4 as 
one bipole factor. One might expect this 
factor structure to be relatively unaffected 
by the language in which the word-associa- 
tion experiment is conducted (as with the 
Spanish-speaking Mexican subjects shown 
in Table 12), since the relationships in- 
volved are expressed in all languages. Sub- 
jects ordinarily utilize all four bases for 
matching words. Most subjects, however, 
also tend to use one basis more than other 
bases. A subject might consistently give 
concept-referent associates, for example, to 
the stimulus words used above to illustrate 
other bases for word matching, for example, 
SPIDER, insect; BLUE, color; Fast, rapid. It 
is this tendency to use one word-matching 
basis predominately and consistently that 
has been termed an idiodynamic associative 
set. 


A Hierarchy of Word-Matching Bases 


It is interesting to compare the notion of 
bases for matching words with that of bases 
for matching actual objects, which has been 
investigated extensively. Of the object-sort- 
ing behavior of adults, for example, it is 
reported that conerete (perception-bound), 
functional, and abstract (categorical) bases 
show, in that order, an increasing positive 
relationship to intelligence and educational 
level (McGaughran & Moran, 1956). De- 
velopmentally, the three object-matching 
bases also emerge in the above order (Rei- 
chard, Schneider, & Rapaport, 1944). Bru- 
ner & Olver (1965) have generalized these 
three matching bases to a developmental 
sequence in “modes of analyzing events”: 


Tn short, then, the functional mode of analyzing 
events seems to develop before there is a full 
development of the superordinate strategies, and 
one is tempted to speculate that the shift from 
the consideration of surface, perceptible properties 
to more embracing functional properties may be 
the vehicle that makes possible the development of 
efficient and simpler grouping strategies [p. 433]. 


A parallel development in  word-word 
matching bases would place the idiodynamic 
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associative sets in the sequence: perceptual 
referent, object referent, concept referent. 

The concept of “oppositeness,” which un- 
derlies the dimension-referent matching ba- 
sis, apparently develops slowly as a “mode 
for analyzing events." Kreezer and Dallen- 
bach (1929) undertook to train 100 children, 
aged 5.0 to 7.5 years, to give opposites to 
25 words. Children aged 5.0-5.5 could give, 
with training, only 40% correct opposites; 
children aged 6.0-6.5, 75% correct opposites; 
children aged 7.0-7.5, 90% correct opposites. 
Rabinowitz (1956), who used the Kreezer 
and Dallenbach training technique with 50 
children, reported a correlation of .61 be- 
tween IQ and ability to learn to give op- 
posites. The dimension-referent associate 
seems to involve a fairly sophisticated 
matching basis. 

In the light of the foregoing, the pattern 
of correlations shown in Table 20 is espe- 
cially interesting. The correlations of the 
predicate variable with each of the other 
set-representative variables are given for 
several samples of subjects. A parallel ac- 
count of Piaget’s four periods in the devel- 
opment of thought, as summarized suc- 
einetly by Carroll (1964), is provided on 
the right side of the table. 

As indicated in Table 20, each mode has 
a progressively greater negative correlation 
with predicate associates. The reliability of 
this phenomenon, with quite different kinds 
of subjects, word lists, and testing condi- 
tions, is substantial. The parallel between 
these correlation patterns and the ordered 
relationship of these four modes with re- 
spect to intelligence and developmental se- 
quence suggest a hierarchical ranking in 
terms of linguistic sophistication.? Analogies 
with object-sorting behavior and coinci- 
dence with Piaget's account of the develop- 
ment of thought provide perhaps not the 
most convincing empirical evidence for 


*For future reference, it might be noted here 
that the sum of unscored associates, treated as a 
single variable, correlated .32 with predication, 
—40 with contrast, and —31 with coordinate (N 
= 353); hence, such “heterogeneous” unclassified 
associates would appear at the base of this hier- 
archy. This “variable” correlated 48 with Factor 
I, Table 11 (N = 353), and 64 on test-retesi 
(90-day interval, N = 195), 
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anking perceptual-referent, object-referent, 
-oncept-referent, and dimension-referent 
vord-matching bases in a hierarchy of in- 
creasing linguistic sophistication. Some sup- 
port for such a view may be found, how- 
ever, from several sources. 

Vygotsky (1962) has described the de- 
velopment of linguistic sophistication as an 
increasing consciousness of language itself 
as a complex system. Linguistic sophistica- 
tion 


presupposes & hierarchy of concepts of different 
levels of generality. Thus the given concept is 
placed within a system of relationships of general- 
ity. The following example may illustrate the func- 
tion of varying degrees of generality in the emer- 
gence of a system: A child learns the word flower, 
and shortly afterwards the word rose ; for a long 
time the concept “flower,” though more widely 
applicable than “rose,” cannot be said to be more 
general for the child. It does not include and sub- 
ordinate “rose”—the two are interchangeable and 
juxtaposed, When “flower” becomes generalized, 
the relationship of "flower" and "rose," as well 
as of “flower” and other subordinate concepts, also 
changes in the child's mind. A system is taking 
shape [p. 92-93]. 


These observations by Vygotsky, based 
upon his systematie study of children, sup- 
ply empirical substance to Quine’s (1964) 
logical concept of “semantic ascent”: 


It is the shift from talk of miles to talk of “mile.” 
It is what leads from the material (inhaltlich) 
mode into the formal mode, to invoke an old 
terminology of Carnap’s, It is the shift from talk- 
ing in certain terms to talking about them [p. 271]. 


This concept of semantic ascent fits the 
hierarchy shown in Table 20: from the per- 
ceptible qualities of a specific thing, to class 
names for such things, to classes of these 
classes, and, lastly, to the dimensions (vari- 
ables) used in the construction of classes. 
Each successive level represents a shift 
from “talking in certain terms to talking 
about them.” 

The position of contrast-coordinates at 
the top of the hierarchy in Table 20 is 
especially interesting, since in one sense, 
these represent dimensional concepts of the 
highest level of abstraction. As Carroll 
(1964) observes, 


Suppose we take any two words at random, say 
tree and stone, and ask a group of people to indi- 
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cate in what respects these concepts differ, as in- 
deed they must. Among some of the answers we are 
likely to get are these: A tree is alive, while a stone 
is inert; a tree is relatively flexible, a stone is 
rigid. That is to say, the mention of any two con- 
cepts evokes a series of perceptual or conceptual 
dimensions in which they differ [p. 102]. 


Dimensional concepts are to categorical 
concepts as amino acids are to proteins; the 
composition or meaning of a category is 
given by its relative position on a number of 
dimensions (e.g., Osgood, Suci, & Tannen- 
baum, 1957). Piaget's description of the 
critieal role of "reversible operations" in 
propositional thinking (Inhelder & Piaget, 
1958) and Kendler's (1965) studies of the 
slow development of “reversal shift" in dis- 
crimination learning of children also indi- 
cate the significance of dimensional con- 
structs for sophisticated thinking. Such 
cognitive developments may not directly 
influence responses in a word-association 
experiment, but some parallels may be 
noted. For example, Entwistle, Forsyth, and 
Muuss (1964) found that kindergarten chil- 
dren gave predicate and contrast-coordinate 
associates to adjectives in the ratio, 438 to 
104; first-grade children gave about an 
equal number, 291 to 257; in the third 
grade, the ratio heavily favored the con- 
trast-coordinates, 102 to 492; and even 
more so in the fifth grade, 88 to 524. 
Whether individual differences in “pref- 
erence” for one or another of these word- 
matching bases in the word-association 
experiment reflects other more general in- 
dividual differences in cognitive structure 
has yet to be determined, of course. A gen- 
eralized difference in “attitude toward 
words” has been advanced to account for 
the object-referent and concept-referent 
sets: object-referent set subjects were hy- 
pothesized to treat words generally as an 
aggregation of labels for concrete things; 
concept-referent set subjects were hypothe- 
sized to treat words generally as concepts 
within a systematic network of other con- 
cepts (Moran et al., 1964). Schmidt (1965) 
recently tested this rationale by administer- 
ing a list of homonyms to subjects who 
previously had been shown to manifest one 
of the above two sets, on a different word 
list. As predicted, object-referent set sub- 
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jects more frequently interpreted the homo- 
nyms as labels for things, for example, NUN, 
sister, and the concept-referent set subjects 
more frequently as concepts, for example, 
NONE, nothing. That linguistie habits such 
as these might possibly have quite broad 
implications for the investigation of cogni- 
tive structure is suggested by studies of 
semantic conditioning. Riess (1946) condi- 
tioned an “electro-dermal response" to se- 
lected words in a list, using subjects at four 
different. age levels. Generalization of the 
conditioned response to homophones, anto- 
nyms, and synonyms was then measured. 
He found, for example, more generalization 
to homophones than to synonyms (159 ver- 
sus 129) in the 7.9 age group, but more 
generalization to synonyms than to homo- 
phones (148 versus 52) in the 18.6 age 
group. Riess (1946) concluded that, 


The present experiment, has demonstrated that the 
relative strength of the semantic gradients does not 
depend on any a priori quality of the stimulus, but 
upon the way in which the whole organism utilizes 
language in its development [p. 151]. 


Worp-AssociaTION COMMONALITY AND 
Ipropynamic SETS 


One final comment on a methodological 
issue. The term “commonality” often is 
used in two quite distinct ways. First, the 
commonality score of an individual subject 
has been used to measure the degree to 
which that individual’s associates conform 
to those given by “normal” subjects. Second, 
the relative frequencies of stimulus-response 
word pairs given by a normative group has 
been used as an index of the relative “as- 
sociation strength” (commonality value) of 
the word pairs. Idiodynamic associative 
sets have a marked influence upon both of 
these commonality measures. 

The commonality score of an individual 
subject, as indicated by several studies in 
this report, is a partial function of the indi- 
vidual’s idiodynamie set and the set com- 
patibility of the stimulus word list. The 
arbitrariness of commonality scores ealeu- 
lated without regard to this function may 
account in part for past failures to relate 
commonality scores to other cognitive or 
more general personality measures (e.g., 
Block, 1960; Jenkins, 1960). 


Use of normative group frequencies of 
stimulus-response word pairs as & measure 
of the relative association strengths of the 
word pairs (commonality tables) implies a 
universality of associative tendencies among 
people of the same linguistic community 
(Rubenstein & Aborn, 1960). There does 
seem to be a remarkable consensus in as- 
sociates to certain stimulus words, like 
TABLE, chair; BLossow, flower (Russell 
& Jenkins, 1954). Even though the Kent- 
Rosanoff word list happens to contain a 
large number of such words, they are actu- 
ally extremely rare in the English language 
(Johnson, 1956). Most stimulus words 
evoke a wide variety of low-frequency as- 
sociates, Since the frequency with which a 
word occurs as an associate (Kent-Rosa- 
noff frequency) and the probability of its 
emission in general linguistic usage (Thorn- 
dike-Lorge count) correlate .94 (Howes, 
1957), it follows that the vast majority of 
words would evoke no one particularly 
“popular” associate from a normative 
group. Very likely, then, the commonality 
hierarchies of most words would resemble 
those shown in Table 4: that is, largely 
determined by the proportions of set 
“types” in the normative sample. While the 
averaged frequencies shown in such tables 
may be of use in linguistics, they can be 
misleading as psycholinguistic indexes of 
“association strength” of stimulus-response 
word pairs for persons who use the language. 
This conclusion only reinforces an earlier 
one reached by Schwartz and Rouse (1961) : 


Applying Whorf's thesis to our results, we could 
say that word-association responses represent “as- 
sociations,” which, when measured across suffi- 
ciently large numbers of people, tend in the aggre- 
gate to constitute a measure of linguistic 
“connections.” ... At present, we have to conclude 
that the use of group norms to study thought 
processes is a risky procedure [pp. 99-100]. 


SUMMARY 


An initial formulation of idiodynamic 
associative sets depieted three groups of 
subjeets who entered the word-association 
experiment with a definite tendency to give 
predominately one class of associate, re- 
gardless of the stimulus words used. One 
group of such subjects tended to give syno- 
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nym-superordinate, another to give con- 
trast-coordinate, and the third to give 
“functional” (e.g., roor, shoe) associates. 

In the present study, two 40-word lists 
were construeted to be equally compatible 
with these three sets, that is, lists which 
evoked associates of the above three types 
with equal frequency from a normative 
group. When applied to a sample of 482 
freshmen students, the lists were found to 
be reasonably equated for the three sets. A 
factor analysis revealed the same three fac- 
tors found in earlier studies. Also, as pre- 
dicted, subjects with specific sets were 
shown to achieve their highest commonality 
scores on set-compatible stimulus words. In 
collaboration with Dr. Rafael Núñez, the 
same associative sets and their differential 
effects upon commonality scores were dem- 
onstrated in Spanish-speaking students at 
the National University of Mexico. 

A fourth idiodynamie set, Jung's “predi- 
cate type," was then investigated in a 327 
freshman sample. The same three factor 
strueture was found for this sample, with 
the predication variable (adjective-noun, 
noun-adjective associates) loading highest 
at the opposite pole of the contrast-coordi- 
nate factor. This study was replicated with 
a sample of 353 freshmen. 

Tn a series of studies, reliability of factor 
strueture, factor scores, and the individual 
variables representing the four sets was 
determined. Both internal consistency and 
test-retest reliabilities generally were ade- 
quate. Sex differences were found not to 
affect factor structure. 

Following Siipola's work on task atti- 
tudes, the effect of time-pressure instrue- 
tions upon sets was examined. Lists A and 
B were administered to two samples of 240 
each (with lists reversed in one sample) 
under time pressure (4-second intervals) 
and under relaxed conditions (8-second in- 
tervals). Siipola’s findings of more contrast 
and fewer predication, with lessened im- 


MORAN 


agery, under time pressure, were confirmed. 
Time pressure had the following effects 
upon all sets: (a) fewer set-representative 
associates, (b) more associates of the other 
set-types, (c) increase in commonality, (d) 
decrease in predication associates, and (e) 
decrease in reported imagery. These findings 
led to a redefinition of the so-called speed 
set (contrast-coordinate) as a dimension- 
referent set. The perceptual-referent (predi- 
cate) set was little affected by time 
pressure; the dimension-referent set was 
“weakened” somewhat; but the object-ref- 
erent (functional) and concept-referent 
(synonym-superordinate) sets virtually dis- 
appeared under time pressure. 

The four sets were discussed as generally 
used bases for matching words, similar to 
the bases used in object-sorting tasks; and 
a hierarchy of such word-matching bases, 
in order of increasing “linguistic sophistica- 
tion,” was proposed. 

Coneluding remarks concerned the arbi- 
trariness of word-association commonality 
norms as a measure of “association strength” 
of stimulus-response word pairs. 


APPENDIX 


The 40 stimulus words in each of the two 
lists used in the present studies are given below. 
They have served a useful purpose in these 
studies, but at the same time, they have been 
found not as closely equated as might be desired 
for future studies. 

List A: MILK, SALT, SHIP, JOY, BREAD, SHOVEL, 
WHISTLE, HAM, SICKNESS, STEEL, BEAUTIFUL, 
DOCTOR, PLAIN, FIDDLE, DIE, JAM, MASK, ROUGH, 
PEER, SMILE, GREEN, PAIN, BOY, BLOSSOM, LAMP, 
CALF, CROW, TABLE, MAN, SHACK, SCISSORS, LONG, 
EAGLE, SCAB, SOUR, TUG, TOBACCO, HIGH, STRAN- 
GLE, KNIFE. 

List B: WHISKEY, BUTTERFLY, STOOL, GLARE, 
STOMACH, JUSTICE, ATL, DISCHARGE, HEAT, OIL, 
MUTTON, DOCK, INCREASE, TABLET, FOOT, PAIL, 
HIT, RADIO, YELLOW, END, CAST, STREET, BITTER, 
STUD, BARK, SOFT, RIP, NAIL, BLACK, CRATE, 
THIRSTY, WOMAN, COTTAGE, SCALE, SWEET, FRIGID, 
NEEDLE, SHORT, AFRAID, EATING. 
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AND 


The implications of generative grammars of English for grammatical rela- 


tionships among sentence constructi 


ions were discussed. A seWes of studies 


investigating the psychological similarity of sentence constructions was 
carried out, employing judgment and recognition techniques. Metric and 
nonmetric multidimensional scaling methods were used to analyze the data 
and to analyze data reported by Mehler (1963). A highly consistent pat- 


tern of psychological similarity was 


found among the constructions investi- 


gated. The obtained pattern was congruent with the pattern of grammatical 
relationships that the Katz and Postal (1964) approach appeared to imply. 


R= revolutionary changes in de- 
scriptive linguistics, stemming 
largely from the work of Noam Chomsky 
in the mid-’50s (e.g, Chomsky, 1957), 
promise to furnish descriptions of language 
that are more adequate for the psychologist 
working in the field of language structure 
than were the previously available de- 
scriptions. The new formulations are also 
seen by some as furnishing many hints 
about the workings of a language user. 
The research to be reported here is con- 
cerned with one of the topics treated by 
the modern generative grammars, that of 
the syntactic relationships among sen- 
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tences. Grammarians of Chomskian persua- 
sion claim that their grammars, in contrast 
to the traditional grammars, clarify the 
underlying structural relationships among 
certain sentence constructions that are seen 
by the language user as intimately related 
or highly similar. The purpose of the pres- 
ent research is to obtain precise behavioral 
measures of the similarity of such syn- 
tactically related sentence constructions. As 
careful measures of the perceived simi- 
larity of intuitively and grammatically re- 
lated sentences, the obtained data should 
have intrinsic interest. Further, the data 
are relevant to the broad topic of the 
parallels between the linguistic description 
of a language and the abilities of the person 
who uses that language, that is, the rela- 
tion between grammar and “cognitive 
structure.” Finally, the data may be of 
value to the linguist who is willing to con- 
sider such empirical evidence in evaluating 
alternative statements of grammatical re- 
lations. 

The nature of the new grammars, gen- 
erative grammars, will be discussed in 
sufficient detail to make clear the types of 
sentence relationships they posit. No at- 
tempt will be made to consider the psycho- 
logical implications of generative gram- 
mars; for indications of such implications, 
the reader is referred to Miller and Chom- 
sky (1963), Chomsky (1961), and Katz 
and Postal (1964). The bulk of the mono- 
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graph will consist of a presentation of the 
measured similarity of the sentence con- 
structions investigated, primarily in terms 
öf the scaled psychological distances among 
the constructions. 


GENERATIVE. GRAMMAR 


Phrase Structure and Transformation Rules 


The aim of a modern grammar of a 
language is to provide an explicit enu- 
meration and description of the sentences 
in the language, relying on purely formal 
(especially, nonsemantic) statements. A 
generative grammar accomplishes this aim 
through the use of rules that can be ap- 
plied in such a way that they can generate 
all the sentences in the language, and only 
those sentences. The generative grammar 
is to contain a basic axiom and a set of 
rules specifying that the axiom is to be 
repeatedly rewritten in certain ways, pro- 
dueing series of strings of symbols. The 
final string in each series of strings is to 
be a sentence in the language, and the 
nonfinal strings in the series may be said to 
underlie the final sentence. It should be 
made clear that the rules to be considered 
here act upon, and produce, symbols and 
strings of symbols which underlie sentences 
in the language. They do not themselves act 
upon or produce actual sentences. That is, 
the grammatical rules generate the struc- 
tures underlying sentences, not the sen- 
tences themselves. 

The following unordered set of rules, 
similar to one presented by Miller (1962), 
is illustrative of one type of grammatical 
rule. In this example, — is to be read as 
“is to be rewritten as" and S is the basic 
axiom. Italicized multiple terms on the 
right end of the rewrite arrows, separated 
by commas, are lists of optional choices. 


Given: S 
S NP + VP 
NPOT-c-N 
VP—V-4NP 
NP — Bill, John, ... 
T = the, a 


N — boy, girl, ball, ... 
V — hit, struck, ... 


With this set of rules, such simple sen- 
tences as "The boy struck the girl" and 
*John hit the ball" ean be generated. As 
mentioned earlier, the grammar is to pro- 
vide a description of the sentences it 
enumerates. Properly phrased, the preced- 
ing set of rules would assign a structural 
description to each of the sentences it 
generates, that is, would indicate which 
parts of each sentence form functional units 
or “constituents” and would assign some 
class label to each constituent. For the 
first sentence generated in the present ex- 
ample, the rules would indicate that “the 
boy” and “struck the girl” form units (to 
be labeled NP and VP, respectively) ; that 
“struck” and “the girl” form units (V and 
NP, respectively); and that “the,” “boy,” 
and “girl” form units (T, N, and N, re- 
spectively). 

The rules of the example represent only 
one of a number of different types of rules. 
Rules of this type are termed “phrase- 
structure” rules, and that part of a genera- 
tive grammar consisting of them, the 
“phrase-structure component.” Phrase- 
structure rules have the defining character- 
istic that they specify that a single symbol 
is to be rewritten as one of a specified set of 
symbols and that the symbol is to be so 
rewritten whenever it occurs (or whenever 
it occurs in a specified immediate context). 
That is, such rules rewrite only single sym- 
bols, not strings of symbols. Further, their 
application is not contingent upon the 
structure of the string of symbols in which 
the symbol to be rewritten is embedded.” 

Another type of rule proposed by Chom- 
sky, the “transformation rule,” differs from 
phrase-structure rules on both these 


?Tt should be pointed out again that the phrase- 
structure rules in a grammar of a natural language 
as well as the transformation rules to be considered 
next, would not actually generate sentences in the 
standard writing system of the language, as the 
rules of our example do. Rather, they would gen- 
erate strings of more or less abstract symbols, rep- 
resenting morphemes or classes of morphemes, 
which later-applied rules would convert into 9 - 
phonetic notation. That is, they generate string’ | 
which underlie a sentence. 
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counts. A transformation rule allows an 
entire string (or set of strings) of symbols 
to be rewritten as a different string. Fur- 
ther, the applicability of a transformation 
rule to a particular string is contingent 
upon the structural description of the 
string; a particular transformation ean be 
applied only to strings of certain specified 
structural descriptions. An example of a 
transformation rule proposed by Chomsky 
(1957) is the passive transformation. The 
passive transformation specifies that a 
string of symbols (morphemes or names of 
classes of morphemes) having the struc- 
tural description NP, — Aux — V — NP, 
can be rewritten as NP, — Aux — V — NP, 
en — V — by + NP,. The transformation 
can be used to rewrite the string of sym- 
bols that underlies, for instance, the sen- 
tence “John hit the ball" into the string 
that underlies "The ball was hit by John." 
The transformation rule, like the phrase- 
structure rule, is to be phrased in such a 
way that it provides a structural descrip- 
tion of its output. 

A transformation, such as the passive 
transformation, that specifies how a single 
string is to be rewritten is termed a “sin- 
gulary" transformation. Another type of 
transformation rule, the “generalized” 
transformation, rewrites a set of two or 
more strings as a single string. For instance, 
there is a transformation rule which con- 
verts the strings underlying “John hit the 
ball” and “The ball is green” into the 
single string underlying “John hit the green 
ball." 

"Transformation rules ean also be classi- 
fied as “optional” or “obligatory.” An op- 
tional transformation, such as the trans- 
formation already mentioned, need not be 
applied whenever a string to which it ean be 
applied is available to it. An obligatory 
rule, however, like a phrase-structure rule, 
must be applied whenever it ean be applied. 
An example of an obligatory transforma- 
tion is one which transforms the string 
underlying the nonsentence* "They brought 
in him" (analogous to “They brought in 
the eriminal") into the string underlying 
“They brought him in." 


Generative grammarians that 


agree 


phrase-structure rules and obligatory trans- 
formation rules are involved in the genera- 
tion of any sentence, no matter how sim- 
ple its structure. In the type of grammar 
thus far described, some optional trans- 
formations are used to generate sentences 
of relatively complex structure. It may be 
pointed out that the use of optional trans- 
formation rules and generalized transforma- 
tion rules is currently being called into 
question by some linguists. It is argued 
that the strings underlying sentences may 
best be generated by the phrase-structure 
and the obligatory singulary transformation 
rules alone (cf. Chomsky, 1965). 

Other types of rules are used to convert 
the products of the phrase-strueture rules 
and the transformation rules, which are the 
strings of symbols underlying sentences, 
into sentences in phonetic notation, How- 
ever, the investigation to be reported here is 
concerned with syntactic structure, and the 
essential description of the syntactic struc- 
ture of a language is provided by the 
phrase-structure and the transformation 
rules. For this reason, rules of lower level 
than these will not be considered here. 


Chomsky’s Transformational Grammar 


Chomsky, in his earlier published works 
(e.g, Chomsky, 1957, 1961, 1962; Chom- 
sky & Miller, 1963; for an introductory 
treatment differing in some respects, see 
Bach, 1964) has provided portions of a 
grammar of English in which certain in- 
tuitively related sentences are described as 
being transformationally related to each 
other. In his grammar, the phrase-structure 
rules apply to the initial symbol and to the 
products of rewriting the initial symbol. 
Transformation rules apply to the products 
of the phrase-structure rules and to the 
products of other transformation rules, 
The phrase-structure rules plus the applica- 
ble obligatory transformation rules are 
used to generate the strings underlying some 
of the simplest, most common, sentences in 
the language. These strings are called 
“terminal strings” and the corresponding 
sentences, “kernel sentences.” In English, 
these are the simple, active, declarative 
sentences such as “John hit the ball.” 


Optional transformation rules are applied 
to the terminal strings (not to kernel 
sentences) and are used to produce the 
more complex sentences in the language. 
Examples of optional singulary transforma- 
tions are the passive transformation, which 
produces strings underlying passive sen- 
tences such as “The ball was hit by 
John"; the negative transformation, which 
allows the derivation of negative sentences 
such as “John didn't hit the ball"; and the 
question transformation, which produces 
yes-no questions such as “Did John hit the 
ball?" In some cases, transformation rules 
can be applied to the products of other 
transformations. For instance, the negative 
transformation can be applied to the prod- 
uet of the passive transformation (yield- 
ing, eventually, the passive-negative sen- 
tence, "The ball wasn't hit by John"), and 
the question transformation can be applied 
to the produets of either or both the passive 
and the negative transformations (produc- 
ing the passive question, “Was the ball hit 
by John?"; the negative question, “Didn’t 
John hit the ball?"; and the passive-nega- 
tive question, Wasn't the ball hit by 
John?”). 

A kernel sentence and its corresponding 
passive represent a pair of sentences that 
Chomsky sees as intuitively related, in- 
tuitively similar structurally. In a grammar 
such as Chomsky’s, they are also seen to 
be transformationally related. They have 
essentially identical histories of derivation, 
except that the passive transformation was 
applied to only one of the pair of sentences. 
The application of the passive transforma- 
tion in the derivation of the kernel would 


TABLE 1 
ExaMPLE or P,N,Q SENTENCE FAMILY 
Sentence Construction 

The man closed the box. K 

The box was elosed by the man. P 

The man didn't close the box. N 

Did the man close the box? 

The box wasn't closed by the man. PN 
Was the box closed by the man? PQ 
Didn't the man close the box? NQ 


Wasn't the box closed by the man? PNQ 
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make it a passive. Similar transformational 
relationships can be seen to exist between 
the kernel and the negative, the kernel and 
the question, the passive and the negative 
question, etc. One can speak of a sentence 
family, which is defined by a particular set 
of optional singulary transformations and 
a single terminal string, and which con- 
sists of all the sentences which result from 
the application of all permissible combina- 
tions of the set of transformations and no 
other optional transformations. The ab- 
sence of any of the specified optional trans- 
formations will be considered to be a per- 
missible combination of transformations; 
thus, the kernel is a member of each sen- 
tence family. 

Table 1 displays the eight members of 
the sentence family defined by the passive, 
the negative, and the question transforma- 
tions, and the terminal string underlying 
“The man closed the box.” It is to be noted 
that the eight sentences used in a previous 
paragraph as an example of the result of 
applying the passive, the negative, and 
the question transformations form a similar 
sentence family, with “John hit the ball” 
as kernel. For brevity, such sentence fami- 
lies will be referred to as “P,N,Q” families, 
and the grammatical constructions of their 
member sentences will be abbreviated as 
K (kernel), P (passive), N (negative), Q 
(question),..., PNQ  (passive-negative 
question). 

The grammatical relationships among 
the members of a sentence family can be 
described by comparing their transforma- 
tional histories. Those sentences that 
differ by a single transformation would be 
the most closely related, while, in the pres- 
ent example, those differing by all three 
transformations would be the least closely 
related. Miller (1962) has furnished 8 | 
graphie way of presenting the transforma- 
tional relationships among sentence family 
members. The members of a P,N,Q sentence 
family can be represented in the cube of 
Figure 1. The closeness of the relationship 
between two sentences (two vertexes in 
the diagram) is indicative of their degree. 
of syntactic relationship, and is measu 
as a function of the number of lines in the 
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(question) 


(passive) 


(negative) 


Fra. 1. Cube representing transformational relationships among sentences. 


shortest path connecting them. Note that 
this cube is non-Euclidean; specifically, 
diagonals in the figure are undefined. In 
traveling between two nonadjacent ver- 
texes of the cube, one must move through 
some set of the intervening vertexes, This 
property of the cube reflects the step-wise 
property of the transformational grammar, 
That is, in moving from one sentence con- 
struction to a construction removed by, say, 
three transformations from the first, one 
must pass through strings underlying con- 
structions removed by one and by two 
transformations. If it is assumed that the 
distance between any two vertexes is equal 
to the sum of the lengths of the shortest 
series of lines connecting them and that all 
corners in the figure are right angles, the 
distances in the configuration are equal to 
distances in a “city block” space (Attneave, 
1950). That is, the distance between any 
two points is the sum of the absolute values 
of the differences between their projections 
on each dimension. In Miller’s cube, one 


transformation would be coordinated with 
each dimension. 


An Alternative Generative Grammar 


There is some doubt among linguists 
that the K, the N, the P, etc., sentences 
are best viewed as transformational vari- 
ants of the same sentence or the same 
underlying terminal string. Rather, some or 
all of the sentences, it is argued, should 
have different derivations in the phrase- 
structure component of the grammar. 

For instance, Lees (1960) writes his 
phrase-strueture rules in such a fashion 
that they, rather than the transformation 
rules, optionally introduce a negative 
morpheme (as a slightly aberrant mem- 
ber of a class of “preverbs”) in the gen- 
eration of a sentence. If the negative 
morpheme is produced in the phrase-struc- 
ture portion of the derivation of a sentence, 
then the sentence will be a negative; if not, 
then the sentence will be nonnegative (al- 
though preverbs other than “not,” such as 
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“never,” or “seldom,” or “always,” may be 
chosen, even if “not” is not chosen). There 
still is a negative transformation, but it 
acts only on sentences having underlying 
strings containing the "not" morpheme (or 
some other preverb) and simply changes 
the order of the morphemes in the string. 
In such cases, the negative transformation 
is now an obligatory transformation. 

Katz and Postal (1964) provide similar 
treatments for the passive and question 
sentences, as well as for negative sentences. 
Phrase-strueture rules, rather than trans- 
formation rules, are to introduce a passive 
morpheme or a question morpheme in the 
derivation of a passive or a question sen- 
tence. Again, there will presumably still be 
passive and question transformations, but, 
as was the case with the altered negative 
transformation, they will be obligatory, 
applicable only to strings having the ap- 
propriate phrase-strueture derivation, and 
will serve primarily to change the order of 
the elements of the string and to delete 
certain of the elements. 

'The treatment of negative sentences as 
having different phrase-structure deriva- 
tions than nonnegative sentences is gen- 
erally viewed as being preferable to the 
transformational treatment of negatives. 
However, the analogous treatment of the 
remaining sentence constructions is more 
open to question. In fact, although they 
provide some grammatieal justification for 
their treatment, Katz and Postal (1964) 
provide different phrase-structure deriva- 
tions for passives, questions, etc., for pri- 
marily semantic reasons. They wish to pro- 
vide a description of both the syntactic 
and semantie aspects of a language, using 
a transformational grammar for the former 
and Katz and Fodor’s (1963) type of 
semantie theory for the latter. Loosely 
speaking, such a semantic theory provides 
rules used to interpret sentences semanti- 
cally, and Katz and Postal (1964) argue 
that (in the ease of sentences having no 
generalized transformations in their deriva- 
tion) these interpretative rules should 
apply to the structured strings produced by 
only the phrase-structure rules. Since the 
kernel and the question sentences, say, ob- 


viously have different meanings, they must 
have different underlying strings on the 
phrase-strueture level and therefore must 
have different phrase-strueture deriva- 
tions. 

It is not appropriate here to evaluate 
Katz and Postal’s approach. However, 
their approach does contain implications 
concerning the relationships among sen- 
tences which differ from those of other 
grammars. This section will be devoted to 
an examination and interpretation of these 
implications. 

An obvious attack on the problem of 
determining the relationships among sen- 
tences within the Katz and Postal frame- 
work is to claim that sentences are re- 
lated, or similar, as some function of the 
extent to which their underlying struc- 
tured strings are similar. Specifically, the 
relationship among sentences would be a 
function of the extent to which their under- 
lying strings have many morphemes in 
common. This approach indicates that 
*John hit the ball" and "The ball was hit 
by John" would be about as closely related 
to one another as “John hit the ball" and 
“The ball hit John" are to each other, since 
the sentences of each of these pairs share 
many of their morphemes. However, the 
linguist wants to argue that the sentences of 
the former pair are linguistically related 
versions of each other, while the sentences 
of the latter pair are not. 

Katz and Postal (1964) propose one reso- 
lution to the problem of distinguishing be- 
tween those sentences that are grammati- 
cally related and those that are not. They 
indicate that sentences whose underlying 
strings differ only in morphemes that are 
linguistieally universal are the sentences 
that are closely related grammatically. 
Linguistically universal morphemes are 
morphemes that are specified by a general 
theory of linguistic descriptions, rather 
than being morphemes specifie to certain 
languages. The negative morpheme, the 
question morpheme, and the passive mor- 
pheme, they claim, are likely to be such 
universal morphemes. Thus, it might be 
that the active and the passive, the ques 
tion and the nonquestion, etc., each differ 
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from one another in only one universal 
morpheme, and thus that the grammatical 
relationships among them are much the 
same as the relations that existed when they 
were analyzed as transformational variants 
of the same terminal string. 

The solution, unfortunately, does not ap- 
pear to be this simple. For instance, in the 
Katz and Postal (1964) approach, the 
question and the kernel are viewed as 
differing by more than the single universal 
question morpheme. They are analyzed as 
differing also in the possession, by some 
strings involved in the derivation of the 
question sentence, of “wh” (another uni- 
versal morpheme) and “either-or” (a mor- 
pheme peculiar to English). It is suggested 
(Katz & Postal, 1964, p. 119) that a ques- 
tion like “Is John a doctor?” results from 
the disjunction between the string under- 
lying “John is a doctor” and the string un- 
derlying “John is not a doctor.” In fact, the 
question sentence is formally related to the 
disjunctive structure underlying “Either 
John is a doctor or not.” One of the com- 
ponents of this disjunction, it is argued, is a 
structured string underlying the kernel, and 
the other a structured string underlying the 
negative. Any relationship between the 
question and the kernel, or the question 
and the negative, is to be accounted for on 
the basis of this part-whole relationship. 
Thus the question is grammatically re- 
lated both to a string underlying the kernel 
and to a string underlying the negative, It 
appears that the question would be more 
or less intermediate in closeness of gram- 
matical relationship to the kernel and the 
negative. 

This interpretation of the relationship be- 
tween questions and nonquestions must be 
considered tentative, and a similar reserva- 
tion must be made regarding the interpreta- 
tions of the relationships among the other 
sentence structures in the Katz-Postal ap- 
proach. Probably affirmative and negative 
sentences are to be considered as differing 
in only a single universal (negative) mor- 
pheme, and thus as directly related. A 
similar analysis apparently may be made 
of the active and the passive. It may be 
that the passive morpheme and the nega- 


tive morpheme can occur together in a 
combinatorial fashion and that a kernel 
and a passive-negative sentence, or a pas- 
sive and a negative sentence, differ by just 
two universal morphemes in their under- 
lying strings. Such a treatment would indi- 
cate relations among the K, the N, the P, 
and the PN sentences (the nonquestions) 
that are very similar to the relations indi- 
cated by the transformational treatment. 

This is not the case, however, for the 
relations among the questions. It appears as 
if a negative morpheme and a question 
morpheme are to occur both in the string 
underlying an affirmative question and in 
the string underlying a negative question. 
Perhaps “did” and “didn’t,” and “was” and 
“wasn’t,” in affirmative questions and nega- 
tive questions, are simply alternative ways 
of writing the same element in an underly- 
ing string. Thus, there would be very little 
if any structural difference between affirma- 
tive questions and negative questions, such 
questions perhaps differing only stylisti- 
cally. Passive questions and active ques- 
tions, on the other hand, seem to differ in 
the possession of a universal passive mor- 
pheme, and thus to be related to each other 
in the same fashion as passive and active 
nonquestions. 

It may be legitimate to represent, at 
least tentatively, the sentence relationships 
posited by the nontransformational ap- 
proach as the prism of Figure 2, analogous 
to the cube of Figure 1. This prism differs 
from the cube primarily in the placement 
of the affirmative and the negative ques- 
tions close to one another, and midway be- 
tween the affirmative and the negative non- 
questions. On the assumption that questions 
that differ by the possession of the passive 
morpheme (Q and PQ; NQ and PNQ) are 
neither more nor less closely related than 
nonquestions so differing, the prism is given 
parallel active and passive planes. It is not 
perfectly clear that the figure should be 
three-dimensional, with the questions re- 
moved from the nonquestions; it might be 
legitimate to view the questions as falling in 
the same plane as the nonquestions and 
thus to represent all eight sentence construc- 
tions in a two-dimensional figure. 


(question) 


AS 
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N (negative) 


Fi. 2. Prism representing phrase-structure relationships among sentences. 


The cube representing the transforma- 
tional relationships among sentences was 
said to be non-Euclidean, reflecting the 
step-wise (i.e., number of transformational 
Steps) nature of the relationships. On the 
other hand, the  Katz-Postal phrase- 
structure relationships are essentially 
“common elements” (or “differences in 
elements") relationships. Such relationships 
do not so unequivocally suggest a step-wise, 
non-Euclidean representation. It would 
seem most reasonable to view the prism 
representing the phrase-structure relation- 
ships among the sentences as a Euclidean 
solid, at least in the absence of closer 
analysis of the relationships. 


STUDIES or SENTENCE SIMILARITY 


Two alternative representations of the 
grammatical relatedness of sentence con- 
structions have been presented. It must be 
realized that these representations, espe- 
cially the syntactical relationships indi- 
cated by the nontransformational, phrase- 


structure approach, are to be regarded as 
tentative interpretations of incompletely 
worked-out grammars. Further, these rep- 
resentations have been considered to be 
metricized, perhaps to an illegitimate de- 
gree; there may be no justification for 
speaking of the relationships as distances, 
Euclidean or non-Euclidean. Finally, the 
hidden assumption used to construct the 
representations, namely that the unit of 
distance coordinated with a transformation 
(or universal morpheme) is the same, re- 
gardless of which transformation (or mor- 
pheme) is involved, has no justification 
in the grammars. In fact, one might be 
willing to argue that the grammars indicate 
that certain distanees should be greater 
than others. For instance, the transforma- 
tional approach analyzes the passive trans- 
formation as being more complex than the 
negative transformation, in the sense that 
more of the elements of underlying strings 
are changed by the former than by the 
latter. If transformational complexity is to 


tme 
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be coordinated with distance, then the 
distance between actives and passives 
should be greater than the distance between 
affirmatives and negatives. However, the 
grammars do not speak clearly enough on 
the topie of the closeness of the relation- 
ships between pairs of constructions differ- 
ing by a single transformation to allow any 
sure predictions to be made regarding the 
relative distances coordinated with the 
various transformations or differences in 
phrase-structure derivation. 

Given these reservations, one can ask if 
various behavioral measures of the simi- 
larity of sentences map out one of the rep- 
resentations of the grammatical relation- 
ships among the sentences. To provide an 
answer, a series of studies investigating the 
similarity among sentences was carried out. 
The sets of sentences investigated were 
those which the transformational approach 
analyzes as members of P,N,Q sentence 
families. (For ease of discussion, the vo- 
cabulary of a transformational grammar 
wil be used throughout to describe the 
sentence constructions and the relations 
among them.) The studies measured the 
judged similarity of such sentences, the 
confusion between such sentences in a 
recognition task, and the changes in con- 
fusion with repeated trials on the recogni- 
tion task. Further, some data presented by 
Mehler (1963) on confusions among such 
sentences in a recall task were reanalyzed. 


MzrHOD or ANALYSIS 


It seems appropriate to describe here the method 
of analysis to be used. The basic data from Experi- 
ments 1 and 2, and the reanalysis of Mehler's 
(1963) experiment, can be represented in an 8 X 8 
matrix of the measured similarities or dissimilari- 
ties between each of the 64 possible pairs of the 
cight sentence constructions (K, P, N,..., PNQ). 
The data from Experiments 3 and 4 can be repre- 
sented similarly in 4 X 4 matrices. Such data 
representations are too complex to be compre- 
hensible, and statistical analyses of the data are 
similarly undesirably complex. T 

It is possible to achieve an efficient reduction 
of such data through multidimensional scaling pro- 
cedures, thereby determining spatial representa- 
tions of the sets of sentences, such that each sen- 
tence construction corresponds to one point in the 
spatial configuration and that the distances among 
the points are related to the measured similarities 


of the sentences (Shepard, 1960, 1962a, 1963). Such 
spatial representations are very suitable for test- 
ing the implications of the grammatical descrip- 
tions, which were themselves given spatial repre- 
sentations. 

Metric multidimensional scaling techniques 
(Torgerson, 1958) were used in some of the studies 
to determine the configurations appropriate for the 
sets of obtained similarity measures. In the re- 
maining studies, a nonmetric scaling technique de- 
veloped by Kruskal (1964a, 1964b) was used. In 
this technique, the matrices of similarities or dis- 
similarities serve directly as the input to the scal- 
ing program. For any specified number of dimen- 
sions in which the data scaling is to take place, the 
technique determines a set of distances among the 
points being scaled which satisfy the metric axioms 
governing physical distance. That is, it determines 
a configuration of points which can be interpreted 
as a spatial model of the objects being scaled. It 
outputs the coordinates of each point in the con- 
figuration, as well as the interpoint distances. The 
configuration determined by the program is such 
that the interpoint distances in it bear the closest 
possible montonic relationship to the similarity 
(or dissimilarity) measures used as input. No as- 
sumption about the function relating the similarity 
measures to the interpoint distances is made, ex- 
cept that the function is a monotonic one, That is, 
for one distance in the resulting configuration to 
be greater than another distance, the measured 
similarity corresponding to the former distance 
must be less than the similarity corresponding to 
the latter. The similarities are thus assumed to be 
measures on only an ordinal scale, while the out- 
put distances satisfy the requirements of a ratio 
scale. 

Kruskal's procedure furnishes a measure of how 
well the distances fit the similarity measures, that 
is, how closely the function relating distances to 
similarities approaches monotonicity. This measure 
of the “success” of the scaling is the normalized 
residual variance of the monotone regression of 
distance upon similarity, and is termed stress. As 
the technique is an iterative one, the stress ob- 
tained for any set of data in a given dimensional- 
ity is to some extent a function of the number of 
iterations employed. However, by increasing the 
number of iterations, any desired degree of ap- 
proximation to the absolute minimum stress can be 
reached. As a rule of thumb, Kruskal (1964a) indi- 
cates that a final stress of 10% is to be considered 
"fair, 5% “good,” 2⁄2% “excellent,” and 0% 
“perfect.” 

The technique allows scaling to take place in a 
variety of non-Euclidean spaces, as well as in 
Euclidean space. Of greatest interest is the possi- 
bility of scaling the similarity data in a city 
block space (Attneave, 1950). Such a capacity is 
of value, because, as it will be recalled, the dis- 
tances in the city block spatial model are equal 
to the “around-the-edges” distances of the trans- 
formational relationships representation of Figure 
1, when all angles in the solid are right angles. 
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Experiment 1? 


Method 


The first study took the most obvious approach 
to determining the similarity among sentences, 
that of having the subjects (Ss) simply rank the 
similarity of the members of sets of K,P,N,..., 
PNQ sentences. The procedure used was the 
method of multidimensional rank order (Torger- 
son, 1958). 

Two groups of introductory psychology students 
at the University of Minnesota were run in a group 
setting, N = 55 in Group 1 and N = 43 in Group 
2. Each S was given a booklet containing 16 sets 
of eight sentences each, The eight sentences in a 
set were the eight members of a P,N,Q sentence 
family; each set of sentences represented a differ- 
ent P,N,Q family. One member of each of these 16 
families was set apart as a “Standard” sentence. 
There were two Standard sentences in each of the 
eight sentence constructions. The Standard sen- 
tences are listed in Table 2. Note that in Experi- 
ment 1, unlike the other experiments to be re- 
ported, the combination of “do” plus the negative 
morpheme was not contracted. 

The Ss were instructed to rank the seven re- 
maining members of each set of sentences with re- 
spect to their similarity to the Standard sentence 
of the set, by assigning the number 1 to the sen- 
tence most similar to the Standard, the number 2 
to the next most similar, etc. The Ss were in- 
structed to rank in terms of the similarity of the 
sentences as they understood “similarity.” It was 
emphasized that there were no right or wrong an- 
swers. The task was administered as a paper and 
pencil test, with a liberal 1-hour time limit. 

The two groups of Ss were given the same sets 
of sentences to rank, in the same order; the two 
groups were used simply for the purpose of deter- 
mining the reliability of the rankings. The same 
random order of presentation of sentences families, 
and order of presentation of sentence constructions 
within a sentence family, were used for all Ss. 

The data were analyzed first in terms of the re- 
liability of the mean ranks between the two groups 
and between the two Standard sentences in each 
construction. Following this, comparative distances 
between the pairs of constructions were obtained 
from the data of Group I only, using the method 
described by Torgerson (1958). These comparative 
distances were treated as ordinal measures of the 
dissimilarity of the sentences and scaled using the 
Kruskal technique described earlier. Also, under the 
assumption that they were measures on a linear 
scale of the true interpoint distances, the com- 
parative distances were converted to distances on 


*Tda Kurez, of the University of Warsaw, col- 
lected and performed the preliminary analyses on 
the data reported as Experiment 1, during a post- 
doctoral year at the University of Minnesota. The 
authors. wish to express their thanks to her for 
making her data available to them. 


TABLE 2 
STANDARD SENTENCES USED IN EXPERIMENT 1 


Betty did not read the book. 

His mother prepared the dinner. 

Did the horse eat the hay? 

Was the pipe dropped by the plomber? (sic) 
The tooth was not filled by the dentist. 
Did not Mary see the fish? 

The shirt was sold by the clerk. 

John hit the ball. 

Was not the piano moved by the truck? 
The paper was written by the student. 
The house was not built by the carpenter. 
Was not the bell rung by Bob? 

Was the cat chased by the dog? 

The man did not close the box. 

Did the professor give a lecture? 

Did not the woman spank the child? 


& ratio scale by adding a constant, and the con- 
figuration of sentences corresponding to these dis- 
tances was determined. 


Results 


Reliability. Both the reliability over the 
two groups of Ss and the reliability over 
the two examples of Standard sentences in 
a particular construction were determined. 
A separate computation of reliability was 
made for each construction of the Standard 
sentences (that is, the reliability of the 
ranks assigned the seven sentences ranked 
with a K as Standard sentence, the reli- 
ability of the ranks assigned with P as 
Standard, etc., were all calculated). In 
determining the reliability between the two 
groups for Standard sentences in a particu- 
lar construction, the ranks assigned sen- 
tences by each group were averaged over the 
two examples of Standard sentences in that 
construction and over Ss in the group, and 
the averaged ranks for Group 1 were cor- 
related with the averaged ranks for Group 
2. Eight such correlations were determined, 
one for each sentence construction that 
served as Standard. In calculating the re- 
liability between the two examples of 
Standard sentences in each construction, 
the ranks assigned sentences for one of 
the examples were averaged over all the 
Ss in the two groups and correlated with 
the similarly averaged ranks for the other 


example, Again, eight correlations were ob- 
tained. 
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TABLE 3 
COMPARATIVE DISTANCES, EXPERIMENT 1 

K P N Q PN PQ NQ 
P —1.81 
N 1.48 1.64 
Q —0.28 0.14 0.43 
PN 1.65 1.18 —0.98 0.55 
PQ 0.24 —0.14 0.67 —1.26 0.34 
NQ 0.16 0.22 0.15 —0.92 0.29 —0.88 
PNQ 0.28 —0.03 0.53 —0.80 0.19 —0.99 71.29 


The reliability of the averaged ranks was 
satisfactory. The correlations between the 
two groups ranged from +.97 to +.99, as 
did the correlations between the two ex- 
amples of Standard sentences. 

Scaling. As mentioned earlier, only the 
data from Group 1 were scaled. Compara- 
tive distances between the pairs of sentence 
constructions were obtained using the 
method described by Torgerson (1958, pp. 
263-269). This method may be considered 
an extension of the Thurstone comparative 
judgment technique. Sixteen incomplete 
8 x 8 matrices (one for each Standard 
sentence), containing the frequencies with 
which Ss judged one sentence (i) in a 
family to be more similar to the Standard 
sentence (k) of the family than is another 
sentence (j) in the family, were obtained. 
An S's set of ranks for one sentence family 
was discarded if he made an error in the 
ranking, for example, if he assigned the 
same rank twice. No more than two Ss had 
to be discarded for any sentence family. 
The corresponding frequencies ;,/;; and 
i, Fi in the pairs of matrices based on the 
two standard sentences of one construction 
were summed and divided by the total us- 
able N for the pair of matrices, to obtain 
eight incomplete 8 X 8 matrices of propor- 
tions, ipi. These proportions were con- 
verted to normal deviates, which were used, 
following Torgerson’s (1958) least-squares 
procedure, to determine a single sym- 
metric 8 x 8 matrix of comparative dis- 
tances between the pairs of sentence con- 
structions. These comparative distances are 
presented in the halfmatrix of Table 3. 

The halfmatrix of comparative distances 
(with the value 2.0 added to all entries, 
to make them all positive) was scaled using 


Kruskal’s technique. These dissimilarities 
were found to scale in one dimension with 
a stress of 0.00% (necessarily, in both 
Euclidean and city block spaces). K and P 
occupied identical positions in the con- 
figuration, as did N and PN, and the four 
questions fell on a single point halfway 
between the affirmative and negative non- 
questions. The plot of interpoint distances 
in the configuration against the input dis- 
similarities is presented in Figure 3. The 
plot may be termed a step funetion. Ap- 
parently the data have a property, men- 
tioned by Shepard (1962b), that results in 
somewhat unsatisfactory solutions. When 
the points in subsets of the whole set of 
points are more similar to one another 
(closer together in the underlying con- 
figuration) than they are to any points out- 
side the subset, the scaling technique will 
indieate that all the points in each subset 
have the same coordinates in the resulting 
configuration, that is, are identical. Such a 
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situation is likely to result in an excessively 
low dimensionality of the configuration. In 
the case where there are just two such sub- 
sets, the data will necessarily scale in one 
dimension. When there are three such sub- 
sets, the data will scale in at most two 
dimensions. In the present case, there seem 
to be three such subsets: K,P; N,PN; and 
Q,PQ,NQ,PNQ. The technique thus failed 
to discriminate between sentences differing 
by just the passive transformation, and be- 
tween all the question sentences. 

Some conclusions may be drawn from 
the obtained configuration, regardless of its 
shortcomings. The position of the questions 
midway between the affirmatives and the 
negatives is congruent with the Katz and 
Postal (1964) grammatical analysis, as is 
the lack of any distinction between affirma- 
tive questions and negative questions. The 
disappearance of any distinction between 
sentences due to the passive-active differ- 
ence is congruent with neither of the 
proposed models, although it may be inter- 
pretable in terms of the lack of any differ- 
ence in meaning between passives and ac- 
tives. However, it is plausible that the 
passive-active difference in the obtained 
dissimilarity measures may simply be a 
good deal smaller than the affirmative- 
negative difference and that the present 
divisibility of the sentences into homogene- 
ous subsets has obscured the relatively 
small passive-active difference. 

The Kruskal technique uses only the 
ordinal information present in the data. 
The comparative distances used as input 
data, however, are presumably measures 
on an interval scale. These comparative 
distances may be scaled using metric scal- 
ing techniques described in Torgerson 
(1958), as suggested by Shepard (1962b). 
Such a procedure should allow a more 
sensitive determination of any passive- 
active difference present in the data. 

The comparative distances of Table 3 
must be converted to distances on a ratio 
scale by adding a constant, in order to 
determine the multidimensional configura- 
tion corresponding to them. The neces- 
sary additive constant was estimated by 
the procedure of Messick and Abelson 


(1956) to be equal to 2.527. The distances 
resulting from the addition of this constant 
to the comparative distances are the dis- 
tances between the points (sentence con- 
structions) of the configuration of sentence 
constructions, under the assumptions re- 
quired by the scaling technique used. The 
coordinates of the points in this configura- 
tion were determined by calculating the 
matrix of the scalar products of the cen- 
troid-origin vectors of the eight points in 
the configuration, using the procedure sug- 
gested by Torgerson (1958, pp. 254-259). 
This matrix of scalar products was factored 
using the principal-axes method. The load- 
ings of each point on the first n factors ex- 
tracted are the coordinates of the points on 
the largest n dimensions of the configura- 
tion. 

The factor solution obtained indicated 
that the configuration of sentences could 
be considered to exist in a real space. One 
negative latent root was obtained, but its 
absolute value was only 3% of the value of 
the sum of the positive latent roots and 
could thus be considered to be well within 
the limits of error. 

Further, the solution indicated that the 
configuration could be fairly adequately 
represented in three dimensions. Following 
the method of Torgerson (1958, pp. 278- 
279), the total variance of a scalar product 
matrix derived from the first three factors 
extracted was compared with the total vari- 
ance of the scalar product matrix which 
was factored. The sum of squares of the 
elements of the matrix derived from the 
first three factors equaled 96% of the value 
of the sum of squares of the original scalar 
product matrix. 

The three-dimensional configuration of 
sentences whose coordinates equaled the 
loadings on the first three factors was 
translated and rotated to a position such 
that the coordinates of the K construction 
were (0,0,0) ; the coordinates of the P con- 
struction (x,0,0), where x indicates sim- 
ply that the coordinate of P on the first 
dimension was free to vary; and the co- 
ordinates of the N construction (y,z,0)- 
The resulting coordinates are presented in 
Table 4. The dimensions of the rotated 


Snaarrry RELATIONS AMONG ENGLISH SENTENCES 13 


TABLE 4 
RorATED Facror Loapines, EXPERIMENT 1 
(PROJECTIONS or POINTS or SENTENCE 
CONFIGURATION ON THE 
DIMENSIONAL AXES) 


Dimension 
Construction 
Passive Negative Question 
K 0.00 0.00 0.00 
P 1.96 0.00 0.00 
N 0.33 4.06 0.00 
Q 0.10 1.67 1.52 
PN 1.87 3.73 0.06 
PQ 1.19 1.62 1.74 
N 0.53 2.03 1.58 
PNQ 1.32 1.79 1.64 


configuration may be identified as a pas- 
sive dimension, a negative dimension, and 
a question dimension. The projections of 
the configuration on each pair of dimen- 
sions are displayed in Figure 4, together 
with an oblique projection of the entire 


(passive) 
4.0 


3.0 40 
(negative) 


configuration (with the passive dimension 
foreshortened by 4/2 for clarity of pres- 
entation). These projections may be com- 
pared with the predicted configurations, 
Figures 1 and 2. 

"There seems to be substantial similarity 
between the obtained configuration and 
that predicted on the basis of the Katz- 
Postal (1964) analysis. Sentence construc- 
tions that are less closely related in the 
grammars (that is, sentences differing by 
more transformations or by more universal 
morphemes) are generally farther apart in 
the configuration. As was indicated by the 
Kruskal technique, the questions fall about 
midway between the affirmatives and the 
negatives. The distance between affirmative 
questions and negative questions is, as ex- 
pected, very small, when compared to the 
affirmative-negative distance among non- 
questions. In addition, the questions are 
somewhat removed from the plane of the 
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Fic. 4. Two dimensional projections of Experiment 1 configuration, with oblique projection inset. 
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nonquestions, a characteristic about which 
the Katz-Postal analysis seemed to make 
no prediction. 

A passive-active distance comparable in 
magnitude to the question-nonquestion dis- 
tance, and decidedly smaller than the 
affirmative-negative distance among non- 
questions, appears in the configuration. As 
was expected, the passive-active distance 
among questions is comparable in magni- 
tude to the passive-active distance among 
nonquestions, although perhaps slightly 
smaller in the former. 

The configuration obtained may be said 
to be essentially that predicted on the 
basis of the Katz and Postal analysis, with 
the corrective that the distance correspond- 
ing to one constructional difference need not 
equal the distance corresponding to another. 
It was decided to see if the same general 
configuration would be obtained using a 
different technique for the measurement of 
the similarity of sentences. 


Experiment 2 


The second study was designed to obtain 
a measure of the confusions among the 
eight sentence constructions of a P,NQ 
sentence family in a recognition task. The 
task used was derived from one used by 
Mink (1962) to study the generalization 
among associatively related words and is an 
extension of the procedure used by Clifton, 
Kurez, and Jenkins (1965) to study con- 
fusion among the K,P,N, and PN sentence 
constructions. Essentially, S in this task is 


shown a list of sentences a small number 
of times, and then is shown a longer list, 
including sentences transformationally re- 
lated to those of the first list, with instruc- 
tions to press a telegraph key to every sen- 
tence that appeared on the first list. 


Method 


The method will be sketched in rough outline 
first. Each of 48 Ss was given six presentations of 
a list of eight sentences. Each sentence had a dif- 
ferent content (was a member of a different P,N,Q 
sentence family) and was in a different one of the 
K,P,N,..., PNQ grammatical constructions. The S 
was then presented once with a list of 128 sen- 
tences, and instructed to press a telegraph key 
whenever he recognized a sentence from the first 
(training) list. Sixty-four of these 128 sentences 
were transformationally related to the training list 
sentences, and 64 were unrelated. Each sentence 
on the training list had eight sentences related to 
it on the second (test) list. These eight sentences 
were all the members of the sentence family of 
which the training list sentence was a member, 

A measure of the confusions between any train- 
ing list sentence and all the sentences related to it 
by some combination of the passive, the negative, 
and the question transformations could thus be ob- 
tained. These confusion measures could be pre- 
sented in an 8 X 8 confusion matrix in which the 
rows represent the test sentence constructions and 
the columns represent the training sentence con- 
structions. A number of training list forms were 
used for the purposes of balancing and replicating 
the design. 

Materials. Sixteen kernel sentences were selected 
from those listed in Table 5 and divided into two 
sets, A and B, as indicated. The eight members of 
the P,N,Q sentence family corresponding to each 
of these kernels were determined, yielding a total 
of 128 sentences. 

Sixteen eight-sentence training list forms were 


TABLE 5 
KERNELS or FAMILIES IN EXPERIMENTS 2 AND 3 


Set A 


Set B 


Betty read the book. 

The plumber dropped the pipe.* 
The doctor cured the patient.* 
The woman spanked the child. 
Bob rang the bell. 

The man closed the box.* 

The professor gave the lecture.* 
The girl wrote the letter.* 

The truck moved the piano.* 
The tractor pulled the plow.* 
The student wrote the paper. 
Joe painted the wall. 


The dentist filled the tooth.* 
The carpenter built the house.* 
The clerk sold the shirt." 

Tom washed the window. 

John hit the ball. 

The boy broke the glasses.* 
The dog chased the cat.* 

His mother prepared the dinner. 
The horse ate the hay.* 

Mary saw the fish. 

The barber cut the hair.* 

The bus carried the people.* 


* Sentence used in Experiment 2. 
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constructed. The eight A forms each contained one 
sentence from each of the sentence families of the 
Set A sentences, and the eight B forms one sen- 
tence from each of the Set B families. Each of the 
16 forms contained 1 sentence in each of the 
K,P,N,..., PNQ constructions, and each of the 128 
sentences appeared on just 1 training list form. 
The particular construction in which a representa- 
tive of a specific sentence family occurred in a 
iraining list form was determined by an orderly 
progression, so that one of the Set A families, say, 
would be represented by K in the first A training 
list form, by P in the second, by N in the third, 
ete. Three randomizations were made of each train- 
ing list form, the same randomizations being ap- 
plied to each form. 

A single test list which contained all 128 sen- 
tences used in the experiment was constructed. 
Sixty-four of these sentences were members of Set 
A sentence families, and 64 were members of Set 
B families. The test list could be paired with each 
one of the 16 training list forms. When the test list 
was paired with a Form A training list, the 64 Set 
A test list sentences were related to the training 
list sentences, while the 64 B test list sentences 
were unrelated and served as control sentences. 
The status of these two sets of 64 sentences as re- 
lated or control sentences was reversed when the 
test list was paired with a B training list form. 

The relationships between the sentences in any 
one training list and the related sentences in the 
test list can be represented in an 8 X 8 matrix 
in which the rows represent the test sentence con- 
struction and the columns the training sentence 
construction, similar to the matrix used in Experi- 
ment 1. Each cell in this matrix was filled by one 
test list sentence in any particular training form 
-test list pairing. Further, the various training 
list forms were constructed so that the sentence 
families were balanced over the cells. That is, 
every sentence family was represented in every 
cell under some pairing. Finally, the balanced 
8 X 8 matrix was replicated over the two sets of 
sentences, A and B. 

Three randomizations of the test list were 
made. One randomization was initially made, and 
the other two randomizations were derived from 
the first by a systematic rearrangement of the 
first, second, and third thirds of the initial random- 
ized list of sentences, with rerandomization within 
the thirds. Each of these three randomizations 
was paired with each of the 16 training list forms, 
yielding a total of 48 pairings. 

Apparatus. The lists of sentences were typed on 
memory drum tapes in capital letters, with all 
punctuation marks except the question mark 
éliminated. A Lafayette memory drum was used 
to present the sentences to Ss. A telegraph key 
was mounted in front of the memory drum and 
connected to a 12-volt transformer-operated buzzer 
audible to both S and the experimenter (E). A tape 
recorder was used to present the instructions. 

Subjects and procedures. Forty-eight under- 
graduates at the University of Minnesota were 


used as Ss. The Ss were obtained from the intro- 
ductory psychology class and were given experi- 
mental credit counting toward their course grade 
for participating in the experiment. Each S heard 
instructions indicating that he was going to see 
a list of sentences and instructing him to press 
the telegraph key immediately after he silently 
read each sentence. The S was instructed to try 
to remember the sentences on the list so that he 
could recognize them later. The three randomiza- 
tions of one training list were then presented 
twice (for a total of six times through the list of 
sentences) at the rate of one sentence every 4 
seconds, with a 4-second interval between random- 
izations. 

Immediately after the presentation of the final 
randomization, S heard instructions indicating that 
he was going to see a longer list containing the 
sentences he had seen on the first list as well as 
some others. He was instructed to press the tele- 
graph key whenever he thought that he recog- 
nized a sentence, but not to press it when he was 
sure that he did not recognize a sentence. He was 
then given one of the three randomizations of the 
test list at a 4-second rate. 

Three Ss were randomly assigned to each train- 
ing list form, and one S in each form was randomly 
assigned to each test list randomization, Half the 
Ss were randomly assigned to one of the two Es 
(PO), and the remaining Ss to the other E (CC). 


Results 


The basic data consist of the proportion 
of Ss pressing to the test list sentences in 
each relationship classification. These data 
are presented in Table 6, in the matrix of 
test list sentence-training list sentence re- 
lationships. These recognition proportions 
index the amount of confusion which oc- 
eurred between the various pairs of sen- 
tence constructions, except for the entries 
on the main diagonal, which indicate the 
proportion of correct recognitions. 

To the right of the matrix is a column 
containing the proportion of presses made 
to the control sentences. As was reported 
by Clifton, Kurez, and Jenkins (1965), 
the number of false recognitions of control 
sentences, relative to false recognitions of 
related sentences, is negligible. It is obvi- 
ous that there is much more generalization, 
or confusion, among the members of a 
transformationally defined sentence fam- 
ily than between unrelated sentences. 

Since the study was, in effect, replicated 
over the two sets (A and B) of sentences, 
it is possible to gauge the reliability of the 
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TABLE 6 
PROPORTION or POSSIBLE Presses TO Test List SENTENCES, EXPERIMENT 2 
Test list ‘Training list construction 
construction K P N Q PN PQ NQ PNQ Control 

K «TT .50 .33 .58 .19 .85 .46 .23 .02 
P .60 .85 -40 54 .52 v E -52 -03 
N .33 .21 .65 27 .50 .33 -40 .38 .02 
Q 27 .29 .35 .60 .21 .38 44 .29 .02 
PN in 27 .50 .27 3 .46 .31 .46 .02 
PQ .25 .38 27 .35 44 .58 .48 .69 .02 
NQ 33 .29 54 .60 .48 .52 1 .56 .03 
PNQ .29 .46 .98 -40 54 .56 -50 79 .05 


confusion measures. One balanced confu- 
sion matrix was obtained for each of the 
sets of sentences, and the 64 scores in one 
matrix were correlated with the 64 scores 
in the other matrix. The obtained correla- 
tion equaled +.72. Since this correlation is 
analogous to a split-half reliability coeffi- 
cient, the Spearman-Brown correction was 
applied, yielding a corrected reliability 
coefficient of +.84. A correlation of this 
magnitude is reassuring evidence of the 
stability of the phenomenon and of its 
invariance over sentences of different con- 
tent. 

One might be concerned about the de- 
pendence of the results upon the particular 
test list randomizations used, especially 
since only three different randomizations 
were used. To determine if the randomiza- 
tion was a critical factor, a Test List Ran- 
domization X Test List Sentence Classifica- 
tion (the elassifications in the matrix of 
Table 6, omitting the diagonal cells) x Ss 
(3 x 56 x 48) analysis of variance was 
run. The entries in the analysis were the 
numbers 0 (no press) or 1 (press). A sig- 
nifieant effect of test list sentence classi- 
fication was obtained (F (55,2475) 
4.14, p < .001), while the Fs corresponding 
to test list randomization and to the 
randomization by sentence classification 
interaction were less than 1. It appears 
that test list randomization is not a criti- 
cal factor in the results obtained. 

As in Experiment 1, the Kruskal scaling 
technique was used to construct a multi- 
dimensional scale of the distances among 
the eight sentence constructions. The basic 
confusion proportion data may be consid- 


ered direct measures of the similarity of 
the sentence constructions. The data were 
made suitable for scaling purposes by 
dividing each proportion, py, by the pro- 
portion on the main diagonal in the same 
row, pa. This conversion serves two pur- 
poses: It provides a correction for any 
tendencies to respond more to some sen- 
tence constructions than to others, ten- 
dencies which seem to be present in situa- 
tions like the present one (Clifton, 1964; 
Odom, 1964). Second, it serves to make 
the entries on the main diagonal all equal 
to 1, indicating perfect similarity between 
a sentence and itself. Symmetric entries 
in the matrix of converted scores (pu/pa 
and pj/p;j) were averaged for the sake of 
the stability of the scores, and a halfmatrix 
of averaged converted scores was obtained. 
This halfmatrix, without the main diago- 
nal, was to serve as the input data for 
the scaling program. It is of interest to 
note that the values in this halfmatrix of 
similarities correlated —.73 with the val- 
ues in the halfmatrix of comparative dis- 
tances obtained from the sentence rank- 
ings of Experiment 1. This indicates a 
fairly close relationship between the 
judged similarity of sentences and the 
recognition confusions among them. 

Sealing was attempted in both two and 
three dimensions, using both Euclidean and 
city block spatial models. A satisfactory 
spatial representation could not be ob- 
tained in two dimensions, the minimum 
stress proving to equal 13.3% when scal- 
ing took place in Euclidean space, and 
7.4% in a city block space. In three dimen- 
sions, however, a configuration with satis- 
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TABLE 7 
ROTATED CONFIGURATION, EXPERIMENT 2 


Dimension 
Construction 

Passive Negative Question 
K 0.00 0.00 0.00 
P 1.10 0.00 0.00 
N 0.42 1.75 0.00 
Q 0.31 0.74 1.11 
PN 1.54 1.59 —0.02 
PQ 1.46 0.74 0.77 
N 0.39 0.94 1.16 
PNQ 1.44 1.14 1.29 


factory stress was obtained for both the 
Euclidean and. the city block spatial 
models. The stress for the Euclidean con- 
figuration equaled 1.4%, and the stress for 
the city block, 1.9%. 

Upon examination, it became clear that 
the three-dimensional configurations did 
not have the regular character needed for 
the city block metric to be applicable to 


(passive) 


(negative) 


0.0 0.6 12 18 


(negative) 


the lattice-metrie Miller-Chomsky transfor- 
mational cube of Figure 1. For this reason, 
and due to the ubiquity of the Euclidean 
model in other applications of multidimen- 
sional scaling, it was decided to present only 
the results of the Euclidean scaling. 

The obtained Euclidean configuration 
was translated and rotated to an a priori 
determined position, as was done in Ex- 
periment 1. The coordinates of the K point 
were set at (0,0,0), the Y and Z coordi- 
nates of P were set equal to 0 with the X 
coordinate left free to vary, and the Z 
coordinate of N was set at 0 with the X 
and Y projections being unfixed. The re- 
sulting configuration of sentence construc- 
tions is presented in Table 7. The first 
dimension can be considered an active- 
passive dimension, the second an affirma- 
tive-negative dimension, and the third a 
nonquestion-question dimension. 

The two-dimensional projections of the 


(question) 
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Fic. 5. Two dimensional projections of Experiment 2 configuration, with oblique projection inset. 
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configuration, as well as an oblique pro- 
jection with the passive dimension fore- 
shortened by 4/2, are presented in Figure 
5. The configuration, it will be noted, bears 
a close resemblance to the configuration 
obtained in Experiment 1, using a metric 
sealing technique, except for the greater 
distance between passives and actives and 
some irregularity in the distances. among 
the questions. 

A number of conclusions may be drawn 
from an examination of the projections. 
As in Experiment 1, the grammatically 
more distantly related sentences are far- 
ther apart in the configuration. The ques- 
tions fall about midway between the 
affirmative and the negative nonquestions. 
The Q point is very close to the NQ point, 
and the PQ point to the PNQ point. On 
the other hand, the distance between pas- 
sive and active questions is sizable and 
comparable to the distance between pas- 
sive and active nonquestions. Considering 
only the nonquestions, it. appears that the 
distance between affirmatives and nega- 
tives is slightly greater than the distance 
between actives and passives. Clifton, 
Kurez, and Jenkins (1965) found affirma- 
tives and negatives to be significantly less 
similar (more distant from one another) 
than passives and actives, and a similar con- 
trast, in exaggerated form, was obtained 
in Experiment 1. As in Experiment 1, the 
configuration obtained is essentially that 
expected on the basis of the Katz and 
Postal (1964) analysis. 

For the purposes of the Kruskal scaling 
technique, the similarity measures (the 
converted response proportions) were con- 
Sidered to be measures on only an ordinal 
scale. However, for the purposes of the 
experiments to be reported next, it was 
thought worthwhile to consider similarity 
as being measured on an interval scale and 
to determine the function relating the simi- 
larity measures to the scaled distances. The 
relationship between converted response 
proportion, S, and distance, D, appeared to 
be continuous and linear (a test for signifi- 
cance of curvilinearity yielded an F < 1, 
with df = 6/20). The best-fit line (least 
squares) was described by the equation D 
= 2.9 — 2.58. 


Experiment 3 
In the third study, confusions among . 
subsets of the sentence constructions in- 
vestigated in Experiment 2 were measured. 
The technique used was essentially the 
same as that in the previous study, except 
that each S was presented only four of 
the eight sentence constructions. Further, 
multiple examples of each of the four con- 
structions—and thus of each of the 16 
possible training sentence construction-test 
sentence construction relationships—were 
shown each S. Five groups of Ss were run, 
each group receiving a different set of four 
syntactic constructions, The studies were 
designed to test the generality of the previ- 
ously obtained measures of similarity among 
sentence constructions in a somewhat differ- 
ent experimental situation. Also, the tech- 
nique of analysis used allowed a check on 
the generality of the function obtained in the 
previous study relating the simiiarity meas- 
ures to scaled interpoint distance. 


Method 


Materials. The 24 K sentences listed in Table 
5 were constructed, and divided into two sets, A 
and B. It-will be noted that the sentences used 
include those used in Experiment 2. The seven 
other members of the P,N,Q sentence families 
of which the Table 5 kernels are members were 
determined. 

A different set of training lists and test lists 
was constructed for each of five groups of Ss. The 
lists differed among the groups only in the gram- 
matical constructions of the sentences they con- 
tained. The Group 1 lists contained Q,PQ,NQ, and 
PNQ sentences; Group 2, K,P,Q, and PQ; Group 
3,- N,PN,NQ, and PNQ; Group 4, K,Q,NQ, and 
N; and Group 5, P,PQ,PNQ, and PN. It may be 
noted that these five groups of sentences form 
five of the six faces of the cube of Figure 1. The 
sixth face, K,P,N, and PN, was investigated by 
Clifton, Kurez, and Jenkins (1965) and was also 
investigated in Experiment 4 of the present series. 
. The procedures followed in constructing the 
lists were those used by Clifton, Kurcz, and 
Jenkins (1965) and are very similar to those used 
in Experiment 2. A training list contained.12 sen- 
tences, 3 sentences in each of the four constructions. 
Eight training list forms were constructed for each 
group. Four forms contained one sentence from 
each of the families of the Set A sentences, and 
four contained one sentence from each of the 
Set B families, The four forms containing sentences 
from Set A families differed in the construction 
in which any particular sentence appeared, as dic 
the four forms containing members of Set R 
families, The construction in which a sentence 
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appeared in a training list form was determined 
by an orderly progression, so that a sentence fam- 
ily that was represented by, for instance, a K in 
one training list form would be represented by a 
Q in the next form, an NQ in the next, and an N 
in the final form. Four training list randomizations 
were made. The same randomizations were applied 
to each form, 

A single test list was constructed for each 
group. The test list for one group consisted of 
all the 96 sentence family members in the gram- 
matical constructions presented to that group. 
Forty-eight of these sentences were members of 
Set A families, and 48 were members of Set B 
families. The test list was paired with each train- 
ing list form received by the group for which it 
was designed. Thus, for each training list-test list 
pairing, 48 of the test list sentences were trans- 
formationally related to training list sentences, 
and 48 were unrelated. The training list sentence- 
test list ‘sentence relationships for each group can 
be represented in a 4 X 4 matrix, with rows rep- 
resenting test list sentence construction and col- 
umns representing training list sentence construc- 
tion. In Experiment 3, unlike Experiment 2, there 
were three examples of each of the 16 relationships: 
that is, 3 of the 48 related test list sentences 
could be classified in each cell of the matrix. 

Three randomizations of the test lists were 
made, using the procedure described in Experi- 
ment 2. The apparatus used in that study was em- 
ployed. 

Subjects and procedures. Forty-eight University 
- of Minnesota introductory psychology students 
were used in each group, for a total N of 240. 

The procedures employed were basically the 
same as those of Experiment 2, and Ss received 
essentially the same instructions. Each S was 
shown the four randomizations of a training list 
form, given the test instructions, and then shown 
one randomization of the test list appropriate for 
his group. Again, a recognition of a test list sen- 
tence wus scored if S pressed a telegraph key when 
that sentence appeared. 

The five groups were run successively, the first 
48 Ss being assigned to Group 1, the second 48 to 
Group 2, ete. Within each group, six Ss were ran- 
domly assigned to each training list form. Each of 
the three test list randomizations was assigned 
to two of the Ss receiving each training list form. 
One of the two experimenters (PO and CC) ran 
one of the Ss receiving a particular training list 
form-test list randomization combination, and the 
other E ran the other S in that combination. 


Method of Analysis 


The basic data obtained from each group 
were the proportions of times Ss re- 
sponded to a test list sentence in a 
given training construction-test construction 
classification, Within each group, two in- 
dependent sets of measures of these pro- 
portions were available, one from Ss who 


received a training list containing the Set 
A sentences, and one from Ss who received 
a training list containing the Set B sen- 
tences. These two sets of proportions could 
be correlated with one another to obtain 
an estimate of their reliability. 

For the purposes of scaling, the obtained 
data were converted to distances using the 
formula that was found to relate con- 
verted similarity scores to the interpoint 
distances in Experiment 2. These dis- 
tances were scaled using the metric tech- 
nique employed in Experiment 1. The re- 
sulting configurations were compared with 
the corresponding subconfigurations of Ex- 
periment 2. For the purpose of facilitating 
this comparison, the interpoint distances 
of the Experiment 2 subconfigurations 
were scaled in the same fashion. Such a 
procedure results in configurations identi- 
eal to the subeonfigurations of the entire 
Experiment 2 configuration, Figure 5. The 
procedure was applied simply because it 
oriented these subconfigurations so that 
they would be directly comparable to the 
Experiment 3 configurations, and because 
it allowed a quantitative determination of 
the dimensionality of the subconfigura- 
tions. 


Results 


Reliability. The number of responses 
made to the Set A test sentences in each 
training sentence construetion—test sen- 
tence construction classification was cor- 
related with the number of responses made 
to the corresponding Set B test sentences 
for each group. The correlations, when cor- 
rected by the Spearman-Brown formula, 
ranged from +.88 to +.92 for the five 
groups. The correlations indicate generally 
satisfactory reliability of the confusion 
measures over two sets of sentences and 
subjects. 

Scaling. The proportion of possible 
presses made to the test list sentences of 
the different classifications is presented in 
Table 8 for each of the five groups. The 
proportion of presses made to the control 
sentences is not shown, being uniformly 
quite low. 

Converted similarity scores were calcu- 
lated from the proportions of possible 
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TABLE 8 
PROPORTION or POSSIBLE Presses TO Test Lisr SENTENCES, EXPERIMENT 3 


Group Test list construction "Training list construction 
Q PQ NQ PNQ 
Q .62 .53 - ` K 
P 56 67 : : 
j Ne 72 54 .72 -58 
PNQ 58 67 66 77 
K P Q PQ 
K -17 .49 45 Al 
2 P .59 .85 En 83 
Q -40 35 .58 44 
PQ 33 47 43 62 
N PN NQ PNQ 
N .72 44 -56 35 
3 PN .40 69 En 59 
NQ 61 49 -76 51 
PNQ 47 65 .59 73 
E Q NQ N 
K -70 44 33 .32 
4 Q 38 64 51 .42 
NQ 35 58 .65 .51 
N .25 .36 .38 53 
P PQ NQ PN 
P -76 -60 -63 51 
5 PQ 51 .65 .53 48 
PNQ 53 62 73 -56 
PN 31 .31 ES 50 


presses for each group by dividing each 
entry in the matrix by the main diagonal 
entry in its row, and the symmetrical con- 
verted scores were averaged. These aver- 
aged scores were converted to distances 
by the formula D = 29 — 258 (aver- 
aged converted similarity score), and the 
scalar products of the centroid-origin 
vectors of the configuration having these 
interpoint distances were determined. The 
resulting scalar product matrices were 
factored using the principal-axes pro- 
cedure. The scalar product matrices of the 
interpoint distances of the subconfigura- 
tions obtained in Experiment 2 were also 
calculated and factored. 

No sizable negative latent roots were 
obtained in any of the five analyses of 
the data of Experiment 3 or the five anal- 
yses of the Experiment 2 subconfigura- 
tions. The largest negative root was ob- 
tained in the analysis of the data of Group 


1, Experiment 3, and it amounted in abso- 
lute value to 3% of the sum of the positive 
latent roots extracted. The Experiment 3 
configurations thus may be considered to 
exist in real space. 

The dimensionality of the configurations 
resulting from the factor analyses was 
tested in the same way as in Experiment 1. 
All the configurations could be considered 
as being two-dimensional without imposing 
too much distortion on the data. The sums 
of squares of the elements in the scalar 
product matrices which were factored were 
compared with the sums of squares of the 
elements of the scalar product matrices 
derived from the first two factors extracted 
in each analysis. The latter equaled 99.6% 
of the former for Group 1 of Experiment 
3, and 99.5% for the corresponding sub- 
configuration of Experiment 2; 99.6% for 
Group 2 of Experiment 3 and 99.9% for 
the corresponding Experiment 2 subcon- 
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figuration; 98.9% for Group 3 of Experi- 
ment 3, and 99.7% for the corresponding 
Experiment 2 subconfiguration; 98.4% 
for Group 4 of Experiment 3, and 99.9% 
for the corresponding Experiment 3 sub- 
configuration; and 95.1% for Group 5 of 
Experiment 3 and 99.9% for the corre- 
sponding Experiment 2 subconfiguration. 

The two-dimensional configurations of 
Sentence structures were rotated and trans- 
lated, and are presented in Figure 6. The 
configuration obtained for each group of 
Experiment 3 is placed next to the sub- 
configuration containing the same sen- 
tence structures in the Experiment 2 con- 
figuration. 

While there are discrepancies within the 
pairs of corresponding configurations, it 
seems reasonable to emphasize the simi- 
larities between the Experiment 3 con- 
figurations and the corresponding Experi- 
ment 2 subconfigurations. First, all the 
Experiment 2 subconfigurations were es- 
sentially two-dimensional, with each 
dimension in each subconfiguration gen- 
erally identifiable with a grammatical 
transformation. The configurations ob- 
tained by converting the Experiment 3 
data to distances using the formula derived 
from the Experiment 2 results were also 
all essentially two-dimensional in the 
same fashion. Second, the most striking 
result of Experiment 2—the fact that 
affrmative questions and negative ques- 
tions are relatively close together, about 
midway between the affirmative and the 
negative nonquestions—was obtained in 
Experiment 3. Third, the distance be- 
tween affirmative and negative nonques- 
tions is, in Experiment 3 as in Experiment 
2, the greatest difference between any pair 
of sentences which may be analyzed as 
differing by a single transformation or a 
single universal morpheme. These three 
types of similarity between the results of 
the two experiments seem to confirm the 
dual assumptions (a) that similarities 
among sentence structures are essentially 
the same whether they are measured 
simultaneously among all eight structures 
investigated or are measured independently 
among subsets of these eight structures, 
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Fic. 6. Experiment 3 configurations, with Ex- 
periment 2 subconfigurations. 


and (b) that the formula found in Experi- 
ment 2 to relate the measure of confusions 
among sentences to distances among them 
may be generalized to confusion measures 
obtained using a somewhat different pro- 
cedure. 

The discrepancies between the mem- 
bers of pairs of corresponding configura- 
tions should not be overlooked, however. 
'The pattern of confusions among the 
four question constructions (Group 1, Ex- 
periment 3) was very similar to the pat- 
tern of confusions among the same four 
constructions obtained in Experiment 2. 
However, the distance between active 
questions and passive questions seemed to 
be smaller in the former than in the latter. 
That is, there were relatively more confu- 
sions between active and passive questions 
when the Ss were shown only questions than 
when they were shown questions in the con- 
text of nonquestions. 

The distance between P and PQ in 
Group 2 and the distances between N and 
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NQ and between PN and PNQ in Group 
3 were relatively smaller than the corre- 
sponding distances obtained in Experiment 
2. While an approximately equal number 
of confusions was found in Experiment 2 
between sentences which could be analyzed 
as differing by the question transforma- 
tion and between sentences differing by 
the passive transformation, generally more 
confusions were found between pairs of 
sentences of the former type than between 
pairs of the latter type in Experiment 3. 

The configuration of sentences obtained 
in Group 4 showed no systematie devia- 
tions from the corresponding configuration 
of Experiment 2. However, the distance 
between P and PNQ in the configuration 
of Group 5 was noticeably small when 
compared to the P-PNQ distance in Ex- 
periment 2. It should be noted that the 
N-NQ distance in Group 4 and the P-PQ 
distance in Group 5 were not strikingly 
smaller than the corresponding distances 
in the Experiment 2 subconfigurations, 
while these same distances were rela- 
tively small in the Groups 2 and 3 con- 
figurations. However, the PN-PNQ dis- 
tance in Group 5 did seem to be too small 
relative to the PN-PNQ distance of Ex- 
periment 2, just as it did in Group 3. 

The extent to which these deviations may 
be considered reliable is not clear. There 
seems to be no particular system to the 
deviations, except for an inconsistent 
tendency for the various question con- 
structions to be more often confused with 
each other and with nonquestion con- 
structions when there is a smaller vari- 
ety of different constructions with which 
they may be contrasted, as in Experi- 
ment 3. It does not seem, however, that 
this uncertain generalization may be ex- 
tended to confusion with the K construc- 
tion, which appears to be uniformly quite 
distinguishable from all other construc- 
tions in Experiment 3. 

The results of Experiment 3, then, seem 
to confirm the major conclusions of Ex- 
periment 2. However, these results do 
differ from the Experiment 2 results in 
their fine details, for reasons that are not 
clear. 


Experiment 44 


In Experiment 4 (as in Clifton, Kurez, 
& Jenkins, 1965), the confusion among 
the K,P,N, and PN sentence constructions 
was investigated. The same basic pro- 
cedures used in Experiment 3 were used. 
However, no control sentences appeared in 
Experiment 4, and the Experiment 3 tech- 
nique was extended by giving repeated 
training and test trials on the sentences, 
Further, confusion among sentences con- 
taining the auxiliary verb “have” was com- 
pared with confusion among sentences not 
containing this auxiliary. The syntactical 
relationships among the K,P,N, and PN 
sentences remain the same whether or not 
the auxiliary appears in the sentence. How- 
ever, the relative degree of graphemic or 
phonemic (or physical) similarity among 
the constructions changes appreciably 
when the auxiliary is added. The study 
should therefore shed a little light on the 
importance of physical resemblance for 
similarities among sentences. 


Method 


Materials. One set of 12 K sentences was 
selected from the sentences used in Experiment 3. 
The P,N, and PN forms corresponding to these 
kernels were determined. Four training list forms 
were constructed in the same fashion as the train- 
ing lists of one sentence set were constructed in 
Experiment 3. Nine randomizations of each of 
these training list forms were made. 

Six randomizations of a single test list were 
made. This test list contained all 48 sentences 
used, but did not contain any unrelated (control) 
sentences. 

A parallel set of training and test lists was 
made, using sentences which contained a form of 
the auxiliary “have” (eg. “John has hit the 
ball,” “The ball has been hit by John,” “John 
hasn’t hit the ball,” “The ball hasn’t been hit by 
John”). These lists were identical to the first set 
of lists, except that all the sentences were in the 
present perfect construction rather than in the 
simple past. 

Apparatus. The apparatus used was similar to 
that used previously, except that two memory 
drums were used, one for the training lists and 
one for the test lists. 


* The data reported as Experiment 4 were col- 
lected by David M. Harrington and Michael Ryan 
while they were NSF Undergraduate Summer Fel- 
lows at the University of Minnesota. The authors 
express their gratitude to them for allowing the 
use of their data in this report. 
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Subjects and procedures. Two groups of 24 
University of Minnesota summer session students 
were run. Twelve Ss in each group were male, and 
12 were female. The Group 1 Ss were assigned 
to the lists containing sentences in the simple past, 
and the Group 2 Ss to the lists containing present 
perfect sentences. Six Ss in each group were 
assigned to each training list form, and two Ss 
receiving one training list form were assigned to 
each of three orders of presentation of the test list 
randomizations. Each Æ (DH and MR) ran three 
Ss assigned to each of the training list forms in 
each group, and each E ran one-half male and 
one-half female Ss. With these restrictions, Ss 
were randomly assigned to conditions and Es. 
The Ss were given the instructions used in Experi- 
ment 3, modified to explain the use of two 
memory drums and to describe the repeated trials 
procedure used. An abbreviated set of instructions 
were read the Ss after each exposure of the test 
list, 

Each S was shown four randomizations of the 
training list form to which he was assigned, at a 
4-second rate. He was then moved to the other 
memory drum and was shown the initial randomi- 
zation of the test list. Upon completing the first 
test list randomization, S was moved back to the 
first memory drum, briefly reinstructed, and 
shown a new single randomization of his training 
list form. He then returned to the second memory 
drum and was shown a second test list randomiza- 
tion, The cycle was repeated until S had received 
six different test list randomizations. A 3-minute 
rest was given between the third test list and the 
fourth training list. 


Results 


The proportion of possible presses to 
test list sentences on the firs& and sixth 
trials is shown in Table 9 for Group 1, 
and in Table 10 for Group 2. It is of 
interest to note that the values for the 
first trial of Group 1 correlate +.95 with 
the corresponding values reported by 
Clifton, Kurez, and Jenkins (1965), using 
a different sample of Ss, a somewhat differ- 
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ent set of sentences, and 48 unrelated con- 
trol sentences on the test list. It may be 
noted that there appeared to be better 
discrimination (more responding to train- 
ing sentences, less to related test sen- 
tences) in the present study than in the 
study by Clifton et al. This may reason- 
ably be attributed to the lack of any 
distracting control sentences in the present 
study. Still, the pattern of generalization 
is remarkably constant across the studies. 

Also worthy of note is the fact that the 
first trial scores in Table 9 (Group 1, sim- 
ple past sentences) correlated +.95 with 
the first trial scores in Table 10 (Group 
2, present perfect sentences). There does 
seem to be a higher level of generalization 
among the sentences seen by Group 2 than 
those seen by Group 1. A Tense X Gen- 
eralization Category (off-diagonal cell in 
the matrix) X Trials analysis of variance 
supports this impression. A significant 
effect of tense (F (1, 46) = 5.78, p < .05) 
was found, as well as a significant effect of 
category (F (11, 506) = 38.85, p < .01). 
A significant trials effect was also found 
(F (5, 230) = 58.72, p < .01). Further, 
the Tense x Categories and the Cate- 
gories X Trials interactions were signifi- 
cant (F (5, 506) = 3.08, p < .01, and F 
(55, 2530) = 2.31, p < .01, respectively). 
The Tense x Categories X Trials inter- 
action approached, but did not reach, sig- 
nificance (F (55, 2530) = 1.33, with 1.42 
needed for the .05 level). The Tense X 
Trials interaction did not approach sig- 
nificance (F < 1). 

The tense effect indicates that there was 
more generalization among present perfect 
sentences than among simple past sen- 


TABLE 9 


PROPORTION OF POSSIBLE PRESSES TO 


Test List SENTENCES, EXPERIMENT 4, 


Simpe Past Sentences (Group 1) 


"Training list construction 


Test list Trial 1 Trial 6 
construction 

K P N PN K P N PN 

K .90 .43 .21 .08 .92 Bu .00 .04 

P .42 T9 .28 .32 .10 .88 .01 .06 

N zt 12 72 49 .08 .09 .93 .31 

PN .03 .19 97 75 .03 .14 .24 .96 
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TABLE 10 
PROPORTION OF POSSIBLE Presses TO TEsT List SENTENCES, EXPERIMENT 4, 
PRESENT PERFECT SENTENCES (Group 2) 
Training list construction 
ACE Trial i Trial 6 
K T N PN K P N PN 
K .72 .49 .29 .15 .89 .21 14 04 
P 57 E .24 .32 .32 .90 .07 12 
N .29 .26 74 .56 AT .10 .88 .28 
PN .25 .30 .61 .81 07 12 .95 +92 


tences, presumably because of the greater 
physical similarity of the former. The 
categories effect indicates that certain 
pairs of constructions were more often 
confused than other pairs of constructions. 
The trials effect simply indicates that 
generalization decreased over trials, and the 
Trials X Categories interaction indicates 
that generalization decreased differentially 
for the various categories. The graph in 
Figure 7 may help to clarify this inter- 
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action. Here the various categories of 
sentences were combined into three cate- 
gories, namely test list sentences whose 
construction differed from their training 
list construction by just the passive trans- 
formation (Pass), test sentences that 
differed by just the negative transforma- 
tion (Neg), and test sentences that differed 
by both transformations (Pass + Neg). 
The proportion of possible presses made 
to the 12 sentences in each of these cate- 
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Trials 
Fig. 7. Percentage of possible responses to test sentences differing from training sentences. 
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gories is plotted against trials. It seems 
that the relative number of erroneous 
recognitions of test sentences differing 
from training sentences by just the passive 
transformation decreased more rapidly 
with trials than did the relative number 
of erroneous recognitions of sentences in 
the other categories, This may simply re- 
flect the fact that the test list sentences 
which differed from training list sen- 
tences by the negative transformation 
reached an asymptotically low level of 
responding in the later test trials. 

The significant Tense x Categories in- 
teraction seems to indicate that the pat- 
tern of generalization differed between the 
simple past and the present perfect sen- 
tences. An examination of the scaled data, 
presented next, will aid in the interpreta- 
tion of this interaction. 

Scaled data. The data from each trial 
of each group were treated in precisely 
the same fashion as were the data from 
each group in Experiment 3. The inter- 
point distances among the K,P,N, and PN 
constructions obtained in Experiment 2 
were also scaled in the same way as the 
Experiment 2 data were scaled in the Ex- 
periment 3 analysis. All the resulting 
configurations could be said to exist in 
real space. The largest negative root was 
obtained in the data of Group 2, Trial 5, 
and its absolute value was equal to only 
0.2% of the value of the sum of the posi- 
tive roots. 

The K-P-N-PN subconfiguration from 
Experiment 2, and the configurations ob- 
tained in the early trials of Experiment 4, 
were essentially two-dimensional. How- 
ever, the configurations of the data from 
the later trials of Experiment 4 do not 
seem to be two-dimensional. It becomes 
more and more necessary to use a third 
dimension in the representation of the 
configurations of the later trials. For the 
subconfiguration of Experiment 2, the 
sum of squares of the scalar product matrix 
derived from the first two factors ex- 
tracted equaled 100.0% of the sum of 
squares of the original scalar product 
matrix which was factored. For Group 1 
of Experiment 4, the corresponding val- 


ues were: Trial 1, 99.9%; Trial 2, 98.8%; 
Trial 3, 96.7%; Trial 4, 89.1%; Trial 5, 
90.4%; and Trial 6, 86.0%. For Group 2 of 
Experiment 4, the values were: Trial 1, 
99.9%; Trial 2, 99.6%; Trial 3, 99.7%; 
Trial 4, 97.0%; Trial 5, 95.1%; and Trial 
6, 91.2%. This increase in the dimensional- 
ity of the configurations over trials is 
reflected in the Trials x Categories inter- 
action obtained in the analysis of vari- 
ance. 

The translated and rotated two-dimen- 
sional representations of the configura- 
tions obtained in Experiment 4, and of the 
Experiment 2 subconfiguration, are pre- 
sented in Figure 8. The third dimension 
is represented by vectors attached to the 
labeled points in the two-dimensional con- 
figurations. The configurations were not 
rotated or translated in the third dimen- 
sion. The length of each vector equals the 
value of the coordinate of each point in 
the third dimension; a vector extending 
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Fic. 8. Experiment 4 configurations, Trials 1-6, 
with Experiment 2 subconfiguration. 
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to the left and down from the point indi- 
cates a positive value of the projection, 
and a vector extending to the right and 
up indicates a negative value. 

The sizes of the configurations may be 
compared across trials and across groups. 
Even ignoring the growth of the third 
dimension, there is an orderly growth in 
the size of the configurations over the 
trials, indicating the smaller number of 
confusions made in the later trials. Fur- 
ther, the configurations of the sentences 
shown Group 2 are generally smaller than 
the configurations of the sentences shown 
Group 1, indicating the greater number of 
confusions made among present perfect 
sentences than among simple past sen- 
tences. 

There is a good deal of similarity in 
shape among the two-dimensional repre- 
sentations of the configurations. A passive 
dimension and a negative dimension are 
identifiable in each configuration. Also, the 
distance between negative and affirmative 
is greater in each configuration than is 
the distance between passive and active. 
These similarities confirm the conclusion 
reached in Experiment 3 that very much 
the same similarity relations among sen- 
tences are being measured in the different 
variations of the recognition technique. 
Further, the similarities between the con- 
figurations of Groups 1 and 2 in Experi- 
ment 4 indicate that the similarity rela- 
tions being measured are, at least, not 
entirely dependent upon the formal (physi- 
cal) similarities among the sentences, 
since Group 1 sentences were in the simple 
past while the Group 2 sentences were in 
the present perfect. This difference in tense 
markedly changes the physical relation- 
ships among the sentences (contrast “The 
boy hit the ball”; “The boy didn’t hit the 
ball” with “The boy has hit the ball”; 
“The boy hasn’t hit the ball”) while leav- 
ing the underlying grammatical relation- 
ships unchanged. 

As in Experiment 3, however, there are 
differences among the configurations 
which should not be overlooked. The 
greater size of Group 2 configurations has 
already been commented upon. Upon ex- 


amination, at least one further difference 
becomes apparent. In the Group 1 con- 
figurations, the distance between K and P 
is almost always greater than the distance 
between N and PN. Clifton et al. (1965) 
also found a greater (though not signifi- 
cantly greater) distance between K and P 
than between N and PN. In the Group 
2 configurations, this difference in dis- 
tances is minimal and less consistent than 
in the Group 1 configurations. It is possi- 
ble that the significant Tense x Sentence 
categories interaction reported earlier re- 
flects the variation between groups in this 
difference. It is interesting to note that 
the Experiment 2 subconfiguration is more 
similar in shape to the Group 2 configura- 
tions than to the Group 1 configurations 
in Experiment 3, even though the sen- 
tences in the latter configurations were in 
the same tense as the Experiment 2 sen- 
tences. 

A consideration of the possible reasons 
for these discrepancies and for the growth 
in the third dimension over trials, will be 
reserved for the discussion, However, be- 
fore the discussion, we will present a re- 
analysis of data reported by Mehler 
(1963). Mehler’s study further extends the 
range of techniques with which essentially 
the same similarity relationships among 
the sentence constructions being investi- 
gated are obtained. 


The Mehler Experiment 


Mehler (1963) reported the results of an 
experiment in which he studied the syn- 
tactic errors in the free recall of sentences. 
The reader is referred to Mehler’s report 
for a complete description of the pro- 
cedure. Essentially, Mehler presented his 
Ss with a group of the K,P,N,..., PNQ 
sentence constructions, each sentence rep- 
resenting a different sentence family. All 
his sentences were in the present perfect 
tense, for example, “The man has bought 
the house" and “Hasn’t the secretary 
typed the paper?” The list of sentences 
was presented for five trials, the S being 
requested to recall all the sentences he had 
heard after each trial. 

Mehler scored the sentences which were 
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recalled as correct (if identical to a pre- 
sented sentence, or differing only in tense, 
in the replacement of the definite article 
by the indefinite article, or in the replace- 
ment of a word by a synonym), as syn- 
tactically erroneous (if the recalled sen- 
tence was a different member of the same 
P,N,Q sentence family as a presented 
sentence), or as otherwise erroneous. 

Correctly recalled sentences and syn- 
tactically erroneous sentences can be clas- 
sified in the 8 x 8 sentence construction 
by the sentence-construction matrix used 
in Experiments 1 and 2. The classification 
of a recalled sentence depends on the con- 
struction in which it was presented (the 
rows in the matrix) and the construction 
in which it was recalled (the columns in 
the matrix). The frequencies with which 
recalled sentences fall into each classifica- 
tion can be entered into the cells of this 
matrix. Mehler presented just such a 
matrix as Table 1 of his article. 

The frequency measures in such a 
matrix may be considered to be similarity 
measures, and scaled using the Kruskal 
technique. However, it seemed wise to 
apply certain corrections to the data be- 
fore scaling, First, since not every sen- 
tence presented was recalled correctly or 
with only a syntactic error, the row fre- 
quencies in the frequency matrix (the 
frequencies with which sentences pre- 
sented in a given construction were re- 
called, regardless of the construction in 
which they were recalled) were unequal, 
Therefore, the frequency of fy in each 
cell was divided by the appropriate row 
frequency 22/8 fy to obtain proportion 
measures py for each cell. Second, there 
seemed to be tendencies to recall sen- 


tences in certain constructions more than 
in others, regardless of the constructions 
in which the sentences were presented, 
for example, there was a definite tendency 
toward recalling sentences in the K form. 
That is, there were differences among the 
column frequencies. Therefore, each py 
entry was divided by the main diagonal 
entry in the same column, py, yielding 
corrected confusion values, cy. This par- 
ticular correction was used in order to 
make the entries on the main diagonal 
equal to 1, indicating perfect similarity 
between any sentence and itself. Finally, 
the symmetrical corrected confusion values 
c, and cj were averaged, yielding a half- 
matrix of averaged corrected confusion 
values (Table 11). 

The entries in this halfmatrix were cor- 
related with the corresponding halfmatrix 
entries of Experiments 1 and 2. The cor- 
relation between Mehler’s mean cy values 
and the Experiment 1 comparative dis- 
tances proved to equal —.66, and the 
Mehler values—Experiment 2 converted 
scores correlation equaled +.83. The mag- 
nitude of these correlations, in particular 
the latter correlation, is surprising when 
it is recalled that Mehler’s sentences were 
of different content than the sentences in 
the other studies and that the former and 
not the latter contained a form of the 
auxiliary verb “have.” It again appears 
that the various measures of sentence simi- 
larity are tapping the same, very stable, 
syntactic property of sentences. 

The halfmatrix of averaged corrected 
confusion values, less the main diagonal, 
was scaled with the Kruskal technique. 
As was the case in Experiment 2, two- 
dimensional solutions were less than satis- 


TABLE 11 
AVERAGE CORRECTED CONFUSION VALUES: MEHLER 

K P N Q PN PQ NQ 
P 0.097 
N 0.091 0.013 
Q 0.081 0.021 0.081 
PN 0.020 0.065 0.135 0.028 
PQ 0.027 0.155 0.031 0.108 0.095 
NQ 0.068 0.038 0.097 0.232 0.044 0.091 
PNQ 0.011 0.070 0.006 0.067 0.092 0.294 0.167 


28 
TABLE 12 
ROTATED CONFIGURATION: MEHLER 
Constructii : = 
s Passive Negative Question 
K 0.00 0.00 0.00 
us 1.33 0.00 0.00 
N —0.02 1.33 0.00 
Q 0.00 0.69 1.14 
PN 1.32 “1.35 —0.03 
PQ 1.33 0.70 1.14 
NQ 0.00 0.69 1.14 
PNQ 1.34 0.70 1.14 


factory. With the Euclidean model, the 
best fitting configuration had a stress of 
17.1% while the best city block configura- 
tion had a stress of 8.6%. In three dimen- 
sions, excellent fits were obtained in both 
Euclidean and city block spaces. The best 
fitting Euclidean configuration had a stress 
of 0.0%, while the best fitting city block 
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Fie. 9. Two dimensional projections of Mehler configuration, with oblique projection inset. 
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configuration had a stress of 2.1%. For 
precisely the reasons given in Experiment 
2, only the Euclidean configuration will be 
considered here. 

The zero stress Euclidean configuration 
was translated and rotated to the position 
of the configuration obtained in Experi- 
ment 2. The coordinates of the obtained 
configuration are presented in Table 12. 
As in Experiment 1 and Experiment 2, it 
is possible to label the dimensions “pas- 
sive," "negative," and “question,” and once 
again it is noted that the questions fall 
about midway between the affirmative non- 
questions and the negative nonquestions. 

The two-dimensional and oblique pro- 
jections of the configuration are shown in 
Figure 9. It is apparent that the obtained 
configuration is precisely the regular figure 
that our interpretation of the phrase-struc- 
ture analysis implied. The points corre- 
sponding to affirmative questions and nega- 
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tive questions have almost the same location 
in the configuration, while the distance be- 
tween active and passive questions is ap- 
proximately the same as the distance be- 
tween active and passive nonquestions. The 
questions are midway between affirmative 
and negative nonquestions, and the distances 
between passive and active, and between 
question and nonquestion, are approximately 
equal to the distance between affirmative 
and negative nonquestions. 

One possible difficulty with the ob- 
tained sealing becomes apparent upon 
examination of a plot of distance values 
in the configuration against the similarity 
measures. As in Experiment 1, this plot is 
a “step function.” Here, similarity val- 
ues within one of three rather wide ranges 
of value are associated with one of three 
values of distance. Apparently, in the 
Mehler data, there were two clusters of 
sentence constructions (Q and NQ; PQ 
and PNQ) in which the constructions 
were very close to each other, relative to 
their proximities to the remaining con- 
structions, while the remaining construc- 
tions were well separated from each other. 

The “step function" obtained might 
lead one to feel that, in the present analy- 
sis, not all the information in the data is 
being used in the construction of the 
spatial configuration. However, whether or 
not it is thought that a more powerful 
treatment of the data would provide à 
more refined estimate of the true similari- 
ties among the constructions, it seems safe 
enough to accept the gross implications of 
the configuration obtained. The conclu- 
sion that the configuration supports our 
tentative interpretation of the Katz-Postal 
phrase-structure analysis of syntactic re- 
lationships is inescapable. 


Discussion 


In the studies reported here, à variety 
of measurement techniques—techniques 
for determining judged similarity, recog- 
nition errors, and reproduction errors— 
were applied to the problem of ascertain- 
ing psychological similarity among sen- 
tence constructions. The sentence con- 
structions investigated were those which 
were members of P,N,Q sentence families. 


In experiments using a recognition tech- 
nique (Experiments 2 and 3) much more 
confusion was obtained among members of 
one sentence family than between members 
of different sentence families, indicating 
some relatively close relationships among 
sentences which are grammatically related. 
Of greater interest, a consistent pattern 
of similarity relationships was found 
among the members of a sentence family. 
Very similar patterns were found in the 
three experiments that simultaneously in- 
vestigated all eight members of a sentence 
family (Experiments 1, 2, and the Mehler 
study). When separate subsets of the sen- 
tence constructions were examined in Ex- 
periments 3 and 4, relationships were ob- 
tained that were similar to those obtained 
in the other studies, although certain devi- 
ations were noted. 

Multidimensional scaling techniques 
were used to analyze the patterns of rela- 
tionships obtained. These scaling tech- 
niques resulted in multidimensional con- 
figurations of sentence constructions, where 
distance in a configuration was related to 
the dissimilarity of the sentence construc- 
tions. Each configuration could be dimen- 
sionalized in terms of the manner of 
grammatical relatedness of the sentence 
constructions. That is, one dimension along 
which sentence constructions were dis- 
placed from one another corresponded 
to the difference between active and pas- 
sive sentences, another to the difference 
between affirmative and negative sentences, 
and a third dimension to the difference 
between nonquestion and question sen- 
tences. With certain exceptions in the case 
of the question sentences, the dimensions 
along which two constructions in the con- 
figuration were displaced from each other 
were those identified with the particular 
grammatical characteristics differentiating 
the sentences. Thus, grammatically less 
closely related constructions were generally 
further apart in the configuration than 
more closely related constructions. 

Deviations from the dimensionalization 
in terms of apparent grammatical differ- 
ences appeared among the questions. Affirm- 
ative questions and negative questions 
were consistently close together in the 
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configurations. Further, affirmative ques- 
tions and negative questions all had ap- 
proximately the same projection on the 
affirmative-negative dimension, falling 
about half-way between affirmative and 
negative nonquestions on this dimension. 

The obtained relationships involving 
questions were not those expected on the 
basis of a transformational grammatical 
analysis of the constructions involved (see 
Figure 1 for a graphical presentation of 
these implications). However, they were 
very nearly those expected on the basis of 
an interpretation of the Katz and Postal 
(1964) phrase-structure analysis of the 
constructions (Figure 2). 

Another characteristic in which the con- 
figuration obtained deviated from that 
which might be expected on the basis of 
the transformational analysis was the 
generally greater size of the deviations 
along the affirmative-negative dimension, 
relative to the size of the deviations along 
the active-passive dimension. The passive 
transformation is a more complex trans- 
formation than the negative transforma- 
tion (in terms of the number of symbols 
in the string being transformed that 
undergo change, i.e., the number of ele- 
mentary transformations involved). One 
might thus expect to find a greater per- 
ceived difference between sentences differ- 
ing by the passive transformation than 
between sentences differing by the negative 
transformation. Such was not the case. The 
Katz and Postal analysis seemed to make 
no prediction regarding the relative sizes 
of the deviations along the dimensions. 

Finally, it may be pointed out that 
the configurations appeared to be better 
described in a Euclidean space, as implied 
by the Katz and Postal analysis, than in 
the non-Euclidean space indicated by the 
transformational analysis. However, the 
sealing techniques used do not provide a 
really satisfactory basis for choosing the 
better of the two spatial models. 

One might ask, does the pattern of 
similarity among sentences really reflect 
the grammatical relationships among the 
sentences? Might it not instead be reflect- 
ing the relationships among the sentences 


in meaning, or simply in phonetic or 
graphemie (physical) similarity? The 
question is not easily answered. One 
might point to the success achieved by the 
grammatical analysis. Such success is not 
presently possibly by an analysis in terms 
of meaning, simply because no method 
exists for specifying the relative closeness 
in meaning of different sentence construc- 
tions. However, certain points in the data, 
such as the greater deviations along the 
affirmative-negative dimension than along 
the active-passive dimension, seem likely 
to be congruent with a meaning analysis. 
Also, it should be remembered that the 
Katz and Postal (1964) phrase-structure 
analysis was designed with an eye toward 
an analysis of the meaning of sentences. 
One could work toward a determination of 
the role of meaning by investigating sen- 
tence constructions which are grammati- 
cally related in ways other than those in- 
vestigated to see if predictions made on 
the basis of grammatical relationships con- 
tinue to hold. In addition, it would be of 
interest to investigate sentences which ap- 
pear to be related semantically but not 
grammatically, for example, sentences in 
which words are replaced by synonyms 
or by opposites. 

Unlike the case of semantic similarity, 
there is no lack of ways in which to specify 
the physical similarity of sentences. In 
fact, the problem here lies in choosing 
among the many alternatives. Rather than 
attempting to defend the choice of any 
particular measure of physical similarity, 
however, we shall simply point out two 
aspects of the data which seem to indicate 
the insufficiency of an explanation in terms 
of physical similarity. 

First, the obtained pattern of perceived 
similarities was much the same, whether 
simple past or present perfect sentences 
were investigated (contrast Experiments 1 
and 2 with the Mehler study, and Group 
1, Experiment 4 with Group 2, Experiment 
4). The grammatical relationships among 
the sentence constructions remain the same 
regardless of the tense of the sentences, 
while the physical relationships change 
markedly. Among the simple past sen- 
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tences, for instance, the physical difference 
between affirmative and negative nonques- 
tions seems to be far greater than the 
difference between affirmative and negative 
questions (compare “John hit the ball”; 
“John didn’t hit the ball” with “Did John 
hit the ball?”; *Didn't John hit the ball?"). 
However, it is hard to find a comparable 
physical contrast among the corresponding 
sentences in the present perfect (“John 
has hit the ball”; “John hasn’t hit the 
ball” versus “Has John hit the ball?"; 
“Hasn’t John hit the ball?”). Nevertheless, 
it was consistently found that the questions 
are psychologically very similar, while the 
nonquestions are very dissimilar, regardless 
of the tense of the sentence. 

Second, there are cases in which the con- 
structions of one pair of sentence con- 
structions are judged to be more similar, or 
are more frequently confused, than the 
constructions of a second pair, while the 
constructions of the second pair are quite 
obviously more similar physically than the 
constructions of the first pair. For instance, 
the active and the passive are generally 
perceived as being more similar than the 
affirmative and the negative, while it seems 
clear that the affirmative and the negative 
are physically more similar than the active 
and the passive (compare “The man 
closed the box”; “The box was closed by 
the man” with “The man closed the box”; 
“The man didn’t close the box”). 

This is not to say that physical similar- 
ity is of no importance in the present re- 
sults. The greater number of confusions 
among sentences in the present perfect 
(Group 2, Experiment 4) than among 
sentences in the simple past (Group 1, Ex- 
periment 4) is consonant with the ap- 
parently greater physical similarity among 
present perfect sentences. Also, the fact 
that N and PN appeared to be closer 
together than K and P among simple 
past tense sentences, but not among the 
present perfect sentences (Experiment 4), 
may reflect an effect of physical similarity. 
(It is difficult, however, to point to any 
aspect of physical similarity which would 
account for this difference, aside from sen- 
tence length.) Finally, it is possible that 


the deviations between the relationships 
obtained in Experiment 3 and the corre- 
sponding relationships obtained in Experi- 
ment 2 may reflect some kind of an inter- 
action between physical similarity and 
the context in which the sentences are 
seen. The sentence constructions which 
were perceived as relatively more similar 
in Experiment 3 than in Experiment 2 
are generally sentences which differ only 
in word order (e.g., P versus PQ, “The 
cat was chased by the dog" versus “Was the 
cat chased by the dog?” and PN versus 
PNQ, “The pipe wasn’t dropped by the 
plumber” versus ^Wasn't the pipe dropped 
by the plumber?"). Perhaps the identity 
of the individual words in these con- 
structions assumes an important role in 
determining the number of confusions 
made when a smaller variety of sentence 
constructions is seen in the context of the 
experiment. 

It will be recalled that, in Experiment 
4, it was necessary to consider the con- 
figurations obtained in the later trials as 
being three-dimensional. That is, there 
was a growth over trials in the size of a 
third dimension which did not reflect any 
grammatical characteristic of the sen- 
tences. In effect, the sentence construc- 
tions became more nearly equidistant 
in the later trials. One might speculate 
that, in these later trials, Ss are no longer 
discriminating on the basis of grammatical 
structure, but have selected certain unique 
characteristies of each sentence and are 
reacting to these characteristics. That is, 
whatever confusions occur among sen- 
tences on the later trials are traceable to 
similarities among certain (possibly phys- 
ical) characteristics of the sentences, 
rather than to the grammatical relation- 
ships among the sentences. Alternatively, 
one might simply say that a minimum 
level of confusions is being approached 
in the later trials for all sentence con- 
structions, implying that all constructions 
must appear to be equidistant from one 
another, Finally, one might suggest that 
what exists here is a scaling problem, 
where the function relating measured simi- 
larity to distance in the configuration is | 
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really, say, exponential, and only appeared 
to be linear in the range of similarity 
values obtained in Experiment 2. 

Further points could be diseussed. For 
instance, it has been hypothesized (Meh- 
ler, 1963; Miller, 1962) that K con- 
Struetion is somehow central, that is, that 
it is basie to the other constructions. (A 
more proper statement would be that the 
terminal string underlying the K is basic 
to the strings underlying the other con- 
structions.) Such an hypothesis is con- 
sonant with a transformational analysis of 
sentence relationships, but not with the 
analysis that indicates the different sen- 
tence constructions to be derived by dif- 
ferent phase-structure rules. In the pres- 
ent analyses, it did not appear that the 
K construction was in any special way 
distinguished from the other construc- 
tions. Actually, any distinction that the K 
might have had might have been ob- 
scured by the corrections for response 
bias employed in the analyses, In Experi- 
ments 2, 3, and 4, this correction took 
the form of an adjustment for tendencies 
to “recognize” sentences in some con- 
structions more than in others (although 
it should be noted that there seemed to 
be no bias toward "recognizing" K sen- 
tences: Clifton, 1964; Odom, 1964). In the 
analysis of the Mehler study, the correc- 
tion adjusted for a tendeney which did 
exist to recall sentences in the K form. It 
might be suggested that K is distinguished 
from the other forms not on any grammati- 
cal basis, but simply on the basis of its 
greater frequency of use in the language 
or perhaps its shorter length, and thus 
that the application of the corrections was 
legitimate for our purposes. 

Finally, the topic of the relation between 
a generative grammar and the linguistic 
abilities of a speaker could be discussed. 
However, the present data justify no new 
strong assertions about this relationship, It 
is perhaps sufficient to point out that the 
studies reported here give substantial evi- 
dence for the existence of some parallel 
between the linguistic description of a 
language and the reactions of a language 
user to his language. 


SUMMARY 


Certain aspects of modern generative 
grammars were discussed. The implica- 
tions of two types of grammars for syn- 
tactic relationships among sentence con- 
structions were examined. One type of 
grammar treats certain related construc- 
tions as being transformational variants 
of a single “terminal string,” and thus as 
being related to each other by sets of 
transformations. This type of grammar 
indicates that certain sentence construc- 
tions would be related to each other by 
an inverse function of the number of 
transformations by which they (or better, 
their underlying strings) differ. This 
type of relationship may be represented 
graphically in the cube of Figure 1, for 
sentences formed using some combination 
of the passive, the negative, and the 
question transformations, 

Another type of grammar, exemplified by 
the grammar presented by Katz and Pos- 
tal (1964), treats apparently related 
sentences as being related by virtue of 
their phrase-structure derivations. Spe- 
cifically, it can be argued that sentence 
constructions are syntactically related if 
their underlying strings differ only in “uni- 
versal” morphemes and that the closeness 
of the relationship is an inverse function 
of the number of universal morphemes 
by which these strings differ. It was 
argued that this phrase-structure treat- 
ment indicates grammatical relationships 
among sentence constructions which are 
similar to those indicated by the transfor- 
mational approach, with certain import- 
ant exceptions. Specifically, in the case of 
the sentence constructions presented in 
Figure 1, the Q and NQ, and the PQ and 
PNQ, are very similar to each other, and 
questions in general are approximately 
equally closely related to affirmative and 
negative nonquestions. These relation- 
ships were presented graphically in Figure 
2. 


. A series of studies was carried out to 


- determine empirically the perceived simi- 


larity of certain Sentence constructions. 
Each study yielded a matrix of similarity 
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or dissimilarity measures, in which the 
rows and columns referred to sentence 
constructions. These matrices were ana- 
lyzed using multidimensional scaling tech- 
niques, either metric techniques described 
by Torgerson (1958) or a nonmetric 
technique proposed by Kruskal (1964a, 
1964b). These analyses resulted in multi- 
dimensional spatial configurations which 
could be compared with those predicted by 
the two grammatical analyses. 

The first study used a judgment tech- 
nique (method of multidimensional rank 
order) to determine the similarity ight 
sentence constructions (K, P, er. 
PNQ). The resulting data did not scale 
ina satisfactory manner when the Krus- 
kal nonmetrie technique was used. Ap- 
parently, subsets of the sentence con- 
structions were more similar to one another 
than they were to any constructions 
outside the subsets, a property of the data 
that results in an excessively low dimen- 
sionality of the scaled configurations. 
However, when the data were scaled us- 
ing the metrice technique, a three-dimen- 
sional configuration emerged that was 
very similar to the one indicated by the 
Katz and Postal phrase-structure anal- 
ysis, with the corrective that passives 
and actives were highly similar to each 
other. 

The second study investigated the same 
eight constructions, using a recognition 
task, in which the confusions between 
related sentences in different construc- 
tions indexed the similarity of the con- 
structions. When these data were scaled 
using the Kruskal technique, a very satis- 
factory configuration was obtained. This 
configuration again was much like that 
predicted by the phrase-structure analy- 
sis, with certain irregularities of dubious 
reliability among the questiqns construc- 
tions. 

In the third experiment, the sentence 
constructions investigated in Experiment 
2 were examined four at a time, using 
the recognition technique. The sulting 
confusion data were converted to distances 
using the formula that was found to re- 
late the confusion measure to distance in 
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Experiment 2, and these distances were 
scaled using the Torgerson (1958) tech- 
nique. The five two-dimensional configura- 
tions which emerged were compared with 
the corresponding subconfigurations of the 
Experiment 2 configuration. Important 
similarities were found, namely, each con- 
figuration could be dimensionalized in 
terms of the phrase-structure relationships 
among the constructions; affirmative and 
negative questions were again very similar 
to each other and midway between affirma- 
tive and negative nonquestions; and the 
distance between affirmative and negative 
nonquestions was, as before, the greatest 
distance between constructions differing 
by only one transformation or one uni- 
versal morpheme. However, there were 
certain differences between the Experi- 
ment 3 configurations and the correspond- 
ing Experiment 2 subconfigurations. While 
these differences were of uncertain reliabil- 
ity and did not fit perfectly into any sum- 
marizing pattern, the best description of 
them would seem to be that sentences 
that differ only in word order are more 
often confused with each other when pre- 
sented in the context of a smaller (Experi- 
ment 3) rather than a larger (Experiment 
2) variety of sentence constructions. 

In the fourth experiment, confusions 
among another subset of four sentence 
constructions (K, P, N, and PN) and the 
changes in confusions over repeated 
training and test trials were investigated. 
Further, confusions among sentences in the 
simple past tense were compared with 
confusions among sentences in the present 
perfect. The data were analyzed as in 
Experiment 3. On the first trial, the ob- 
tained configurations were much the same 
as the corresponding subconfiguration of 
Experiment 2. There were generally more 
confusions between sentences in the pres- 
ent perfect than between sentences in the 
simple past, but the patterns of confu- 
sions were very similar. The one devia- 
tion noted was that, among simple past 
sentences, N and PN were more often con- 
fused than were K and P, while this did 
not hold for sentences in the present per- 
fect. Over trials, an increase in the di- 


34 Currroy AND Opom 


mensionality of the configurations was 
noted. The constructions tended to be- 
come more nearly equidistant, and the con- 
figurations could no longer be described 
perfectly on the basis of the grammati- 
eal relationships of the sentences. This 
change was thought to have been due to 
a change in the manner by which the 
sentences were recognized, to a ceiling 
effect, or to a defect in the transforma- 
tion used to convert the confusion meas- 
ures to distances, 

Finally, some data presented by Meh- 
ler for confusions among sentence con- 
structions in recall were reanalyzed using 
the Kruskal technique. The resulting 
configuration was precisely the regular 
configuration predicted by the Katz and 
Postal phrase-structure analysis. 

Over all studies, much the same relation- 
ships were obtained among the sentence 
constructions investigated. This asser- 
tion is supported by the similarity of the 
scaled configurations, and by the high cor- 
relations among the data obtained in Ex- 
periments 1 and 2 and by Mehler. In each 
configuration, constructions analyzed by 
the Katz and Postal phrase-structure gram- 


mar as being more closely related were 
closer together in the scaled configurations, 
and the configurations could be dimension- 
alized in terms of the grammatical re- 
latedness of the sentence constructions. 
These results indicated a powerful and 
consistent effect of grammatical relation- 
ships among sentences on their perceived 
similarity. 

An explanation of the results on the 
basis of physical similarity of the sentences 
was ruled out, primarily on the basis of 
the comparability of the results obtained 
with sentences in different tenses and on 
certain apparent inconsistencies between 
perceived and physical similarity, How- 
ever, it was tentatively concluded that 
physical similarity did have an effect over 
and above the effect of grammatical simi- 
larity. The possibility that the results 
could be due to the semantic similarity 
of the sentences rather than to their gram- 
matical relatedness was briefly considered, 
but it was suggested that no meaningful 
conclusions could be made at the present 
time because of the lack of a metric of 
semantic similarity of sentences. 
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In 8 successive experiments, as representatives of management or labor, 256 
graduate business students bargained individually with counterparts on 9 
issues. 2 of the 4 treatments of each experiment required groups of Ss to 
plan strategies or to study the issues without considering bargaining tactics. 
Various kinds of prenegotiation study groups were contrasted. Also, some 
Ss planned strategies or studied alone rather than in groups. In the 1st and 
8rd experiments in which deadlines were imposed, those negotiators who 
had prepared themselves by planning strategies were more likely to dead- 
lock, more so if they had planned in advance in groups rather than alone. 
Detailed analyses are presented of the effects of the treatments within each 
experiment on specific contract outcomes, the overall favorability to the 
company of the settlements, the departure of the agreements reached from 
community norms and the speed of settlement. The latter 2 outcomes (de- 
parture and speed) were highly correlated. A variety of treatment effects 
appeared, some of which were consistent across experiments. Also, agree- 
ment of each 2 negotiators on the relative importance of issues depended on 
prenegotiation treatment as did the judged importance of most of the 9 
issues and the postsettlement evaluation of the adequacy of the settlement 
reached. Personal orientation of the negotiators also affected outcomes. 
Thus, task-oriented negotiating pairs reached settlement closer to com- 
munity norms while self-oriented negotiating pairs tended to agree more 
closely on the importance of the issues. Company and union representatives 


favored using different tactics with different concerns in mind. 


EVERAL rather independent approaches 

have developed in the study of inter- 
group conflict resolution. Economists, inter- 
ested in the exchange of value, have pursued 
rational and deductive formulations of the 
problem (Rapaport, 1960; Schelling, 1957; 
Boulding, 1962) with heavy emphasis on 
the mathematics of game theory. Political 
scientists like Mack and Snyder (1957) 
have extracted generalizations from surveys 
of historical materials. Psychologists like 
Sherif et al. (1961) and Blake and Mouton 
(1961) have focused on the socioemotional 


1 This work was supported by Contract Nonr 
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Research, Many contributed to various phases of 
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ful in earlier analysis as were Sue Wolpert, Richard 
Karppinen, and Walter McGhee in later stages. 
Avi Porat was responsible for interviewing indi- 
vidual negotiators and content analyses of their 
reported tactics. I am also indebted to my pro- 
fessional colleagues, James Vaughan and Raghu 
Nath, for their cooperation and aid. 


aspects of in-group, out-group identifica- 
tion or the implications of reinforcement 
theory (Osgood, 1962). 


Common ELEMENTS OF BARGAINING 


Intergroup conflict contains a number of 
common rational and emotional elements 
whether it occurs between nations, between 
union and management, or between de- 
partment heads of the same company who 
must negotiate transfer prices for goods 
that one head is transferring to the other. 

The conflicting groups share a common 
fate. Agreement must be reached if either 
party is to survive and prosper (Siegel & 
Fouraker, 1960). Rival nations, competing 
unions and managements, or contentious 
department heads must resolve their con- 
flict to avoid mutual social and economic 
losses. 

Typically, bargainers are engaged in a 
complex non-zero-sum game. Both lose if 
each seeks to maximize his own gain at the 
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expense of the other. Both parties can 
profit, although not maximally, by means 
of a cooperative solution. Both gain when 
they compromise at less than maximum 
return for each. Yet, there is no guarantee 
that the non-zero-sum game will produce 
cooperating bargainers. On the contrary, 
competitive strategies are often maintained 
to the detriment of all concerned (Scodal, 
Minas, Ratoosh, & Lipetz, 1959). The bar- 
gainers will cooperate only if they can 
develop mutual trust through appropriate 
communications and if they are orient 
toward each other’s welfare (Deutsch, 
1957; Loomis, 1957). 

Organizational commitments. The goals 
and constraints on our actions often are 
dictated by organizational considerations. 
Little variance in behavior is left to per- 
sonal idiosyneracy when we act as organiza- 
tional representatives. Surprisingly, this 
aspect of bargaining has been igno 
generally by those primarily interested in 
the rational elements of bargaining. Never- 
theless, bargainers usually negotiate as 
representatives of their respective organiza- 
tions. Their bargaining is strongly influenced 
by their group commitments. The industrial 
relations director at a collective bargaining 
session is constrained to a great degree by 
higher management authority and his 
management peers with whom he may al- 
ready have marked out the limits of what 
he may do. The union representative knows 
he must strive to achieve a resolution satis- 
factory to the rank and file (Gouldner, 1954). 
Negotiators drawn from competing experi- 
mental groups in a zero-sum “you win, I 
lose” game are completely locked into con- 
flicting positions by group identifications. 
Hardly ever can two representatives agree 
which of the two groups they represent 
did the better job, for instance, in preparing 
an essay on an assigned topic. Each remains 
committed to his own group’s product. As 
group representatives, subjects are seriously 
biased in favor of their own group in the 
evaluation of the situation. Their flexi- 
bility is impeded by loyalty to their own 
group. Deviation from their own group 
position is treasonous. Their unwillingness 
to compromise is supported by fear of 
censure from their own group. Even after 


‘competing as individuals about which one 


studying opposing points of view, these 
partisans see more divergences than actually 
exist between their own and other positions. 
The inability of a negotiator representing a 
group to agree that his opponent’s group 
did a better job is not necessarily a con- 
scious bias out of fear of sanction by his 
own members if he were to capitulate, for 
these biased evaluations will appear to the 
same degree even if complete secrecy is 
maintained about the source of the deci- 
sions (Blake & Mouton, 1961). 

The significance of group commitment 
to understanding the behavior of the in- 
dividual negotiator representing the group 
is emphasized by noting the much lower 
degree of partisanship exhibited by the 
individual bargainer who is representing 
only himself. Thus, Vegas, Frye, and Cas- 
sens (1964) showed that two individuals 


wrote the best essay have relatively little 
difficulty in agreeing that one or the other 
paper was best. Here, there is little over- 
evaluation of one’s own product and de- 
valuation of the opposing entry. This is in 
marked contrast to what happens when 
group representatives discuss the merits of 
their respective group products. The per- 
ceptual distortion which takes place in the 
evaluations of a negotiator who comes out 
of a group to represent it in the bargaining 
process is much less likely to appear when 
individuals are representing only themselves 
in negotiations. The inability of negotiators 
to reach agreement, to perceive issues in the 
same way, often lies in their group commit- 
ments, identifications, and loyalties. 


PURPOSE 


If negotiators from competing groups 
were to be freed from these perceptu: 
distortions as well as conscious fears O 
sanction, it was reasoned that we would 
need to understand and control the groul 
process which ordinarily precedes negotia 
tions. 

To develop such understanding was th 
purpose of the three successively planni 
experiments to be reported. The first ex 
periment set out to see if bargainers coul 
avoid the hardening of lines and commit: 
ments to partisan positions, if before n 
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gotiations they met together for joint study 
of the issues dividing them, spending their 
time in informative discussions rather than 
in tactical maneuvers or in the planning of 
such maneuvers. Making both sides study 
the issues together before beginning to 
bargain, it was argued, might increase their 
tendency to focus more on the common 
interests of both sides and less on difficul- 
ties that do not really exist. Also, such 
bilateral study among future negotiators 
offered all an opportunity to become ac- 
quainted personally with those with whom 
one would subsequently negotiate. 

This first experiment looked at bilateral 
Study among those who would subsequently 
face each other in negotiations, Three other 
Situations were created and compared with 
bilateral study in their effects on subsequent 
negotiations. In one such treatment, joint 
study was afforded, but not with persons 
who subsequently would be met in bargain- 
ing sessions. This was to see the effect of 
personal familiarity on the behavior of 
negotiators independent of the effects of 
joint study, per se. In another treatment, 
unilateral study groups were set up so that 
future negotiators studied the issues but 
only with members of their own side. A 
third condition attempted to simulate the 
ordinary prebargaining strategy meeting of 
groups facing forthcoming negotiations, 
(The American Management Association, 
1963, advertises a training course for execu- 
tives in preparing “your strategy for pre- 
senting management demands... how best 
to set company goals in your prebargaining 
Sessions. .. .’’) 

When this first experiment found that 
unilateral study softened conflict as much 
as did joint study, it suggested the second 
experiment which examined what happens 
when one side studies the issues while the 
other plans strategies. Also, the second ex- 


PAST CONTRACT: $1.94 per hour 


periment was conducted without deadlines 
to see their importance to differential out- 
comes. The third experiment returned to the 
fundamental question of how important 
was group commitment to a negotiator’s 
attachment to a strategy. It compared 
negotiators who had studied the decisive 
issues alone or in unilateral groups before 
bargaining. It also compared those who 
planned strategies alone before bargaining 
with those who planned in groups before- 
hand. 


METHOD 


A non-zero-sum union-company bargaining 
game created by Campbell (1960) was modified and 
employed to test the differential effects of the 
treatments in the three experiments. Minor modifi- 
eations made each experiment somewhat different, 
so that, generally, statistical comparisons should 
be limited to treatments within the same experi- 
ment. 


The Problem 


All participants were given a page of back- 
ground information about the Townsford Com- 
pany, a small textile firm, and its union, conclud- 
ing with the paragraph: 


The three year contract has now expired, 
Negotiations broke down in the final week with 
both sides adamant in their positions. The only 
agreement reached was that each side would 
select a new bargaining agent to represent, it, 
scheduled to meet today (the first day of strike) 
in an attempt to reach a quick solution and 
avoid a long strike. 


Contract Issues 


There were nine issues for bargaining: hospital 
and medical plan, wages, sliding pay scales to con- 
form to cost of living, seniority, union representa- 
tive on the Board of Directors, nightshift differ- 
ential, vacation pay, establishment of a work rules 
committee, and a checkoff system. Each partici- 
pant received a graphic statement of the current 
union and company positions on each issue and 
the financial cost to the company in thousands 
of dollars for a 2-year period. For example, for 
the wages issue, it was as follows: 


UNION: demanded an increase of 16 cents per hour 


COMPANY: refused outright 


cents increase per hour 


COMPANY 00 02 04 06 
Estimated total value 
in thousands of dol- (0) (8) (16) (24) 


lars for two years 


08 10 12 14 16 UNION 


(32) (40) (48) (56) (04) 
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Participants also received data on each of the 
nine issues for four other local textile plants in the 
same community and averages for other industries 
in the same city. Two of the other textile plants 
employed the same type of workers as the Towns- 
ford Company. 

Five of the nine issues involved money. Four 
others, like the question of seniority, did not. 

In addition, union representatives received a 
more detailed, one-page memorandum explaining 
the union’s position, while company representa- 
tives received a one-page company memorandum 
explaining the company position in more detail. 

‘A summary of union and company positions and 
the normative position defined by the two other 
plants with the same type of workers are shown 
in Chart 1. 


Subjects 


Prior to presenting the bargaining problem, the 
256 subjects, all graduate business students, were 
assigned as the union or company representatives 
according to whether their scores on a 42-item 
questionnaire about union-management attitudes 
developed by Hepler (1953) and refined by Camp- 
bell (1960) were above or below the sample median. 
This was to increase the identification of subjects 
with the position they had to take as representa- 
tives. 

As might be expected, the 256 graduate business 
students as a whole were more pro management 
(X = 1174) than the 132 undergraduates (6. 
120.0) drawn from psychology classes by Camp- 
bell, but the range of attitudes was about the 
same for both samples. 

Fifty percent of the 256 graduate business stu- 
dents scored between 101 and 126. Ninety percent 
scored between 92 and 135. Their median score was 
114. They tended to pile up cases in the moderate 
pro management portion of the scale while students 
from psychology classes tended to concentrate 
more heavily in the moderate pro union region. 


The Assignment 


Assembled in a large classroom, the subjects 
were instructed as follows: 


Your are going to take part in a study of 
collective bargaining, You will be assigned as a 
union or company representative depending on 
your expressed attitudes in the questionnaire 
you completed two weeks ago. 


Participants were given 5 minutes to read the 
background and contract information described 
before. Then a copy of the contract itself was 
given each participant, and the instructions con- 
tinued. 


In order to settle an issue, both negotiators 
must accept some specific position on the issue. 
When both negotiators are in agreement on an 
issue, one man should read aloud the issue and 
the position to be endorsed. He then circles on 


both copies of the contract the position to be 
endorsed, and each man initials the item in the 
space provided at the right on both copies. Once 
both negotiators have agreed and initialed the 
issue, it is settled for the two-year contract 
period, and it may not be changed later in the 
negotiations. Any man may open the discussion 
and any man may read and circle the position 
of the issue once agreement is reached, These 
procedures have been established by joint agree- 
ment of the union and company. 

Issues for Bargaining (Chart 1) is a list of the 
issues you are to settle and a memorandum 
concerning the issues. You will be given time to 
examine this information and to take another 
look at the Background Information. 

There are nine issues to be resolved. The is- 
sues are not arranged in any order of impor- 
tance, and you may discuss them in any order 
or combination you desire. Under each issue you 
will find a statement of the specific provisions 
of the past contract and the positions of the 
company and the union when negotiations ended 
last week. Next, you will find a scale that shows 
at the left the present position of the company 
and at the right the present position of the union 
on a given issue. Between these extremes some 
possible compromises are listed for your con- 
venience. And, finally, beneath the scales in the 
parentheses, you will find estimates of the 
amounts of money (in thousands of dollars) 
that each of the possible agreements directly 
above would cost or gain for your group in two 
years. 


In the first two experiments, negotiators were 
reminded each 5 minutes of the amount of time 
being consumed in negotiation. They were to 
consider each 5 minutes as being equivalent to 
1 full day of negotiation. At the end of each 5- 
minute interval, a loss accrued of an additional 
$6,000 to each side in wages or profits. No such 
loss was involved in Experiment 3. 

In the first and third experiments, if no contract 
was negotiated completely in 70 minutes (or 14 
simulated days), negotiations were broken off and 
the strike continued. 

The second experiment was run without dead- 
lines so that the negotiations continued until all 
contracts were signed. The last contract in Ex- 
periment 2 was signed in about 130 minutes as à 
consequence of some prodding by the experi- 
menter. 


TREATMENTS 
Experiment 1 
A total of 33 contracts were negotiated 
by the 66 subjects who were involved in 
the first experiment. For Treatment A, 
prior to negotiations between single union 


and single company representatives, two 
unilateral groups, one of nine pro manage- 
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Chart 1 
ISSUES FOR BARGAINING 


1. Hospital and Medical Plan: 


Past Contract: Company paid 14 of cost, employee paid remaining 34 


UNION: demanded company pay full cost 
COMPANY: refused to pay more than 14 


proportion of company payment 


COMPANY Xx A LA LA UNION 
Total money value (0) (6) (12) (18) 
2. Wages: 


Past Contract: $1.94 per hour 
UNION: demanded an inerease of 16 cents per hour 
COMPANY: refused outright 


cents increase per hour 
COMPANY 00 02 04 06 08 10 12 14 16 18 UNION 
Totalmoney value (0) (8) (16) (24) (82) (40) (48) (50) (04) (72) 


3. Sliding Pay Scale to Conform to Cost of Living: 


Past Contract: pay seale is fixed through the term of the contract 
UNION: demanded pay increases in proportion to increases in the cost of living 
COMPANY: rejected outright 


COMPANY NO YES UNION 
Total money value (0) (20) 


4. Seniority: 
Past Contract: straight plant-wide seniority, workers are laid off on the basis of the number of 
years with the company 


UNION: rejected any changes in the seniority principle ( 
COMPANY: demanded some flexibility in the seniority rule; wants to establish departmental 
seniority (seniority rule would apply within departments only) 


COMPANY YES NO UNION 
Total money value (0) (0) 
5. Union Representative on the Board of Directors: 
Past Contract: no union representative on the Board 
UNION: demands one union representative be appointed 
COMPANY: rejected outright 
COMPANY: NO YES UNION 
Total money value (0) (0) 
6. Night Shift Differential: 
Past Contract: an extra 5 cents per hour is paid for night work 
UNION: demands a 5 cent increase to 10 cents per hour 
COMPANY: rejected 
cents increase per hour 
COMPANY 0 1 2 3 4 5 UNION 


Total money value (0) a) (2) (3) (4) (5) 
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Chart 1—Continued 


7. Vacation Pay: 


Past Contract: 2 weeks paid vacation for all workers with one year service 
UNION: wants 3 weeks paid vacation for workers with 10 years of service 


COMPANY: rejected 


2 wks. for 3 wks. for 3 wks. for 3 wks. for 
1 year 20 years 15 years 10 years 
service service service service 
COMPANY UNION 
Total money value (0) Q4) (2) (5) 


8. Establishment of a Work Rules Committee: 


Past Contract: no work rules committee exists 


UNION: rejected establishment of committee 


COMPANY: demanded establishment of a work rules committee composed of two company repre- 
sentatives, two union representatives and two efficiency engineers from an industrial 
consulting firm to study and to be responsible for changes in work rules 


COMPANY YES 


NO UNION 


Total money value (0) 


(0) 


9. Check-Off System: 


Past Contract: workers pay union dues to union representatives on pay day 
UNION: demanded a check-off system whereby the company deducts union dues from the worker's 


pay for the union 
COMPANY; rejected the check-off system 


COMPANY NO 


YES UNION 


Total money value (0) 


ment men and the other of nine pro union 
men were told to formulate strategy in 
preparation for subsequent negotiations, as 
company or union representatives respec- 
tively: 


You should use the 30 minutes to plan your 
bargaining strategy, to formulate a package of 
agreements, to prepare for concessions and to 
decide on items on which you feel as a group each 
man representing you should stand firm. 


For Treatment B, one unilateral company 
group of eight and one unilateral union 
group of eight were instructed to study the 
issues as follows: 


You should use the 30 minutes to learn as a 
group as much as you can about union and com- 
pany positions. Rather than formulate any strate- 
gies for bargaining, the purpose of this 30 minute 
study is to promote understanding of the other 
point of view in comparison to your own, to see 
the areas of greater and lesser disagreement. 


For Treatments C and D, 16 union and 


(0) 


16 company representatives were given the 
preceding assignment, then divided into 
four bilateral study groups. Each joint or 
bilateral group contained four union and 
four company representatives who were 
instructed as follows: 


In these study groups of union and company 
representatives, you should devote the 30 minutes 
discussion time with learning as much as you can 
about each others’ positions. You should do no 
negotiating or bargaining during this time. The 
purpose of the study group is to promote under- 
standing of the other point of view in comparison 
to your own, to see the areas of greater and lesser 
disagreement. Negotiation will come later. 


Following this, for Treatment C, half of 
these representatives negotiated as indi- 
viduals with other single individuals from 
their own bilateral study group, while for 
Treatment D, the other half had to nego- 
tiate with counterparts who had been in a 
different bilateral study group. 

Subjects of each group knew only about 
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their own treatment until the postsession 
critique. 

Some strategic planning actually took 
place in the unilateral study groups, but it 
involved setting general guidelines within 
which its members could remain highly 
flexible. For example, the union unilateral 
study group set a goal of $61,500 in mone- 
iary concessions to be obtained from the 
company, but no specific way of attaining 
this was decided. 


Experiment 2 


For all four treatments in this second 
experiment, there was composed a total of 
16 six- or seven-man strategy or unilateral 
study groups, 8 groups representing the 
union and 8, the company. For each treat- 
ment, two groups met for study or planning 
for each side. A total of 102 subjects who 
subsequently negotiated 51 contracts were 
involved in this second experiment. 

Treatments A’ and B’ were near-replica- 
tions of Treatments A and B of Experiment 
1. The difference was that in Experiment 2 
there were no negotiating deadlines to be 
taken into account in planning strategies 
(Treatment A’) or studying the issues in 
unilateral study groups (Treatment B’). 
In Treatment E, the two groups of company 
men planned strategies before bargaining as 
individual negotiators while the two groups 
of union representatives studied the issues 
unilaterally beforehand. Treatment F was 
the reverse of Treatment E. The two com- 
pany groups studied the issues beforehand 
while the two union groups planned strate- 
gies in advance of negotiations. 


Experiment 3 

A total of 88 subjects took part in this 
third experiment. Of these, 24 underwent 
Treatment A”, negotiating 12 contracts 
after having planned strategies as union or 
company representatives. Twenty nego- 
tiated 10 contracts after having studied the 
issues in unilateral union or company groups 
of five men each (Treatment B”). The re- 
maining 44 planned strategies alone (Treat- 
ment G) or studied the issues alone (Treat- 
ment H) in equal numbers of 11 each for 
each treatment and to provide the necessary 


union and company representatives for sub- 
sequently negotiating 22 contracts. 

Inadvertently, no mention was made of 
the cost of negotiating time in this third 
experiment. For this reason, Treatments 
A" and B” are not directly comparable 
with Treatments A' or A and B' or B of the 
preceding experiments as the effect of such 
announced costs, per se, on negotiations are 
unknown although they may be quite mini- 
mal aecording to Campbell (1960). He 
found that when stated costs per simulated 
day were $3,000, negotiators did not behave 
significantly differently than when costs 
were $6,000. 


NEGOTIATIONS 


Following the 30-minute study or strategy 
session, 128 representatives negotiated with 
128 opposite representatives, in pairs, until 
a contract was signed or (in the first and 
third experiments) the paired negotiators 
deadlocked at the end of 70 minutes of 
bargaining. Actually, 7 of the 33 pairs of the 
first experiment failed to sign contracts in 
the 70 minutes and 5 of the 44 pairs of the 
third experiment failed to sign contracts. 

A postnegotiations questionnaire was com- 
pleted by all participants concerning how 
long they would like to see the contract 
last, how skilled they felt the other nego- 
tiators had been, how acceptable the con- 
tract was to one's side, who got the better 
deal, the defensibility of one's position, and 
how congruent the role they had played had 
been with their own beliefs. They also ranked 
the nine bargaining issues in order of im- 
portance. 

A postsession critique was conducted with 
negotiators of a given experiment assembled 
together and by individual interviews. 


ANALYSES OF VARIANCE 


Four objective and 15 subjective varia- 
bles were examined. The time to complete 
contracts, the absolute departure of the 
contract settlement from the “going rate,” 
the favorability of the settlement to the 
company and the extent negotiators agreed 
on the relative importance of the nine bar- 
gaining issues were the objective data gath- 
ered. 
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both equal; 1 = own position less de- 
fensible) 

14 Congruence of assigned role to one’s real 
beliefs (5 = completely; 4 = highly; 3 = 
somewhat; 2 = slightly; 1 = not at all) 

15 Quality of the opponent’s performance as a 
negotiator (1 = very poor; 2 = poor; 
3 = fair; 4 = good; 5 = very good) 


For each of these variables in Experiment 
1, analysis of variance contrasted the effects 
of the four treatments. The degrees of free- 
dom were distributed as follows when data 
were available for all 33 contracts or 66 
negotiators. 


Objective variable 


(i) Between treatments 
Error 


Total 
Subjective variable 


S Bos 


(i) Between treatments 
(j) Union or company respond- 
ent 
ixi 
Error 
"Total 


GR Be mwh 


In Experiment 2, the analysis of objective 
data was as follows for 48 of the 51 contracts. 
To provide an equal number of cases per 
treatment (12), a total of three contracts 
were withdrawn randomly from the sub- 
jects of Treatments B’ and F. 


For Experiment 3, the objective analysis 
was as follows for 36 contracts of the 44 from 
which 9 were withdrawn because of dead- 
locks or at random to equalize the number 
of cases in each treatment: 


Objective variable 
df 
(i) Strategies or studies 1 
(j) Alone or in groups 1 
ixj 1 
Error 32 
Total 35 
Subjective variable 
af 
(i) Strategies or studies 1 
(j) Alone or in groups 1 
(k) Union or company respond- 1 
ent 
ixj 1 
ixk 1 
ixk 1 
ixjxk 1 
Error 64 
Total 71 
Results 


Mean differences among the experimental 
treatments will be discussed in order in the 
text that follows. The relevant completed 
analyses of variance for Experiments 2 and 
3 are tabulated for reference as an Appendix. 
Such complete analyses were not under- 
taken for Experiment 1, and only the re- 
sults of such analyses that were done for 


Two sets of subjective responses were ob- Objective variable 7 
tained concerning each contract—one from a fe O sion SUN o or etudics i 
union negotiator, the other from a company (5 © epu PEES E | 
negotiator. These included the following 15 atudios l 
responses: ixj 1 
Error 44 J 
1-9 Rank in importance assigned to each bar- Total 47 ] 
gaining issues F | 
10 Length of time in months the respondent GESTIS UE m 
would like to see the contract in force 2 - k z 
rr ^ (i) Union strategies or studies 1 
11 Acceptability of contract to one’s side (1 = (j) Company strategies or 
unacceptable; 2 = partially acceptable; studies 
3 = highly acceptable; 4 = fully ac- (k) Union or company respond- 1 
ceptable) ent 
12 Estimate of who got the better deal in the ixj 1 
settlement (1 = union; 2 = both about ixk H 
equal; 3 = company) jxk H 
$ 3$ ves 1XjXk 1 
13 How easy it was to defend one's position Error 88 
(3 — own position more defensible; 2 — 
Total 95 
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Experiment 1 will be stated in the text. 
The tabled analyses in the Appendix are as 
follows: 


c c Table SEN Analyses si Variance of the 
Experiment 2 Experiment 3 qutm on: » 
Al A5 Contract outcomes 
A2 A6 Specific settlements 
A3 AT Satisfaction with out- 
comes 
A4 A8 Ranked importance of 


issues 
OxsecTIvE RESULTS 


Speed of Conflict Resolution 


It was assumed that for comparable 
conditions, the firmness of two negotiators’ 
commitments to their respective groups or 
strategies would be reflected in the total 
amount of time they required to settle the 
nine issues and to sign the contract. The 12 
deadlocks, where no resolution was achieved, 
necessitated calculating harmonic mean 
times rather than mean times for all cases 
in order to take into account the absence of 
values for the deadlocked cases. (The har- 
monic mean is the reciprocal of the mean 
of the reciprocals of the original values. 
Deadlocks are assumed to be settled in an 
infinite amount of time. The reciprocal of 
infinity is zero, so the deadlocks now can be 
treated as zero values and included in the 
distribution with the reciprocals of all other 
obtained values. The harmonic mean time 
can be interpreted as the speed with which 
the conflict was resolved where the mean 
time is the length of time required to resolve 
the conflict. Where many deadlocks oc- 
curred as in Treatment A of the first ex- 
periment, a highly inflated harmonic mean 
was produced as seen in Table 1.) 

Where deadlines were employed in the 
first and third experiments, it can be seen 
from Table 1, they produced deadlocks. 
These deadlocks did not appear haphazardly. 
They never occurred for Treatments G and 
H where individuals planned or studied 
alone prior to negotiations. They occurred 
most often when negotiators had been in 
strategy planning groups before negotiating 
as individuals. 

When no deadlines were applied as in 


Experiment 2, the differential effects disap- 
peared on speed of negotiation of Treat- 
ments A’ and B’, group strategy planning 
versus group study. The obtained differences 
in harmonic mean times between Treatments 
A and B and A” and B” of the first and 
third experiments hinged on the differential 
amounts of deadlocks generated by the 
respective treatments. Subsequent critiques 
of the strategies formulated by opponents 
suggested that strategy planning led to 
relatively less flexible positions. If deadlines 
were imposed and if the strategists had non- 
overlapping strategies, deadlocks were likely. 
On the other hand, if the opposing strategists 
had highly overlapping strategies, very 
speedy resolution was possible. For example, 
if the company’s strategy was to hold firm 
on no wage increase and the union’s strategy 
was not to settle for less than a 6-cent 
increase, a deadlock was probable. Con- 
versely, if the company strategy was to 
yield a maximum of an 8-cent increase on 
wages and the union was happy to settle for 
6 cents, a more rapid negotiation of this 
issue was likely than if opponents had no 
fixed plans and had only studied the issues. 
In short, the effects of strategic thinking 
could produce deadlocks on the one hand, 
or faster-than-average resolutions on the 
other. Where deadlines were imposed, the 
overall effect on the calculated harmonic 
means was to yield slower speeds of nego- 
tiating for strategists than for those who 
studied the issues. Where no deadlines were 
imposed (Experiment 2), strategic thinking 
led to as fast or faster resolution than 
studying the issues—although, to repeat, 
this mean difference depended on the con- 
tents of the opposing strategies involved. 

In all three experiments, harmonie mean 
times varied significantly at the 1% or 5% 
level as a function of treatment. However, 
in Experiment 1, the three methods of 
studying the issues did not yield differ- 
entially significant speeds of settlement. In 
Experiment 2, fastest resolution occurred 
when company strategists faced union 
representatives who had studied the issues. 
In Experiment 3, the greater source of 
variance was associated with whether or 
not the negotiators had been alone or in 
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TABLE 1 
EFFECTS or PRIOR EXPERIENCE OF NEGOTIATORS ON Contracts THEY NEGOTIATE SUBSEQUENTLY 
[9) u: P (6) 
"EE ug Eo on 
MA Deadlocks in minutes of settlement of settlement , relative 
p Cc a E A 
negotiations ea 
Experiment 1 (all in groups; 
deadlines) 
A Plan strategy 9 5 163.0 230.5 49.0 -60 
B Unilateral study 8 1 24.5 171.9 78.2 .52 
C Bilateral study (with fu- 
ture opponents) 8 0 30.5 185.5 64.5 76 
D Bilateral study (with 
others) 8 1 20.5 170.1 79.9 .98 
All 33 7 61.0 183.5 67.9 61 
Experiment 2 (all in groups; no 
deadlines) 
A’ Plan strategy 12 — 38.0 264.1 116.4 .50 
B’ Unilateral study 14 — 41.3 245.1 40.2 46 
E Companies plan strategy, 
unions study 12 — 29.6 202.2 125.7 24 
F Companies study, unions 
plan strategy 13 — 38.7 295.2 57.7 36 
All 51 37.1 266.6 85.2 .39 
Experiment 3 (deadlines: no 
cost associated with nego- 
tiation time) 
A" Plan strategy (in groups) 12 3 56.2 265.3 0.4 54 
B" Unilateral study (in 
groups) 10 2 51.9 288.0 91.8 .A8 
G Plan strategy (alone) 11 0 44.8 235.2 63.7 .40 
H Study (alone) 11 0 36.3 337.2 125.0 E 
All 44 5 47.4 281.5 70.7 .46 
Grand totals 128 12 46.8 250.3 75.8 AT 


* Minus signs have been omitted from in front of all values in this column to make reading easier. 
To interpret, the higher the value in the column, the more favorable the settlement to the company. 


groups rather than whether they had 
planned strategies or studied. All five dead- 
locks occurred for negotiators with pre- 
negotiation experience in groups. No dead- 
locks occurred for those who had worked 
alone prior to negotiating. There was a 
cumulative effect of treatments as can be 
seen in Table 2. Group strategists took 
longest; individuals who studied alone were 
fastest in negotiations. 

In all three experiments there was a 
subjective indication of greater conflict be- 
tween those who had planned strategy 
than between those who had studied the 
issues. Although no objective measures 
were made, the experimenter and assistants 


sensed that the noise level generated by the 
arguing of negotiators from strategy groups 
was far more intense than the noise created 
by the arguments of the negotiators from 
study groups. In sum, in comparison to 
those who only studied the issues, nego- 
tiators committed to strategies could ne- 
gotiate as rapidly, particularly if their 
strategy overlapped their opponent’s strat- 
egy, but they were more likely to be caught 
in deadlocks if forced to negotiate against 
deadlines. 


Direction of Settlement: Calculations 


A scoring procedure developed by Camp- 
bell (1960) was applied to each of the 116 
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TABLE 2 


Harmonic Mean Tıme, IN Minutes, To RgACH Conriicr RESOLUTION By NEGOTIATORS AS A FUNCTION 
or Previous EXPERIENCE IN STUDYING Issues OR PLANNING STRATEGIES ALONE OR IN 
UNILATERAL Groups (EXPERIMENT 3) 


Prenegotiation Experience 


Group 
(N = 21 contracts) 


Alone Both 
(N = 23 contracts) (N = 44 contracts) 


Study 

(N = 22 contracts) 
Strategy 

(N = 22 contracts) 
Both 


(N = 44 contracts) 


51.9 36.3 42.7 
56.2 44.8 49.3 
53.8 40.3 45.8 


signed contracts. One score indicated how 
much the agreement reached by the two 
negotiators on each issue deviated alge- 
braically from the “going rate” in the other 
two comparable local textile plants in the 
same community. Another score indicated 
the absolute deviation of the settlement from 
the “going rate." 

The deviation values were obtained by 
assigning zero to a settlement of an issue 
at the going rate and converting succeeding 
intervals departing from this zero point into 
percentages of 100 units. If there were four 
deviating points on a seale for an issue, 
then the deviations from the going rate 
were 25, 50, 75, and 100; for five points, 
they were 20, 40, 60, 80, and 100. If the 
going rate, or zero deviation, was between 
what the union and management were 
demanding, then the union position arbi- 
trarily was positive (more than the going 
rate) and the management, negative (less 
than the going rate). Therefore, the lower 
the sum of algebraic values of deviations 
for the nine contraet issues from the going 
rates, the more the settlement favored 
management; the higher the sum of alge- 
braic deviations, the more the agreement 
favored the union. When signs were taken 
into account, all mean settlements were 
negative, at less than the going rate, that is, 
favorable to the company. Column 5 of 
Table 1 shows the obtained means as a func- 
tion of the treatments of each experiment 
omitting the minus signs in front of each of 
the means. As displayed in Table 1, the 
higher the numbers the better the outcome 
for the company. 


When the signs were ignored in calculat- 
ing the sum of the absolute deviations for 
each contract, the value indicated simply 
how much the negotiators departed from 
the going rate, in one direction or the other, 
in settling each issue in order to reach a 
final agreement. Column 4 of Table 1 shows 
the aggregate results. They exclude, of 
course, the 12 deadlocked sets of negotia- 
tions since no contracts were agreed to by 
the deadlocked parties. 

While treatments within experiments 
were a significant source of variance in the 
departure of the contracts from the going 
rate, similar treatments failed to maintain 
their differential effects in the three experi- 
ments. 


Departure from the Going Rate: Results 


As seen in Table 1, Column 4, for the 
first experiment the four bargaining pairs 
from the strategy groups who reached 
agreement had to depart more widely from 
the going rate than did the bargainers from 
study groups, as a whole. The mean abso- 
lute deviation for these four pairs of strate- 
gists was 230.5 while it was 175.8 for all 
those from study groups (p < .05). 

Again, in the second experiment, those 
negotiators from strategy planning (Treat- 
ment A’) reached agreements departing 
more (264.1) from the going rate than those 
who came from Treatment B’ unilateral 
study groups (245.1), but these results 
failed to attain statistical significance, and, 
in the third experiment, results were op- 
posite. Those who studied (Treatments 
B" and H”) reached agreements scoring 
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on the average at 312.7. This was statisti- 
cally greater at the 5% level of confidence 
than the value of 250.3 attained for those 
who underwent the strategy Treatments 
A” and G. 

Significantly greater departures, 266.6 and 
281.5, occurred for Experiments 2 and 3 
than for Experiment 1 where the mean 
value was 183.5. It may be that negotiators 
were more prone to reach settlements closer 
to normative solutions to the various issues 
when they had combined pressure on them 
to settle quickly, that is, deadlines and 
announced costs for each 5 minutes they 
used to negotiate (Experiment 1). When 
they could operate without deadlines (Ex- 
periment 2) or they could use time to nego- 
tiate without any stated cost (Experiment 
3), they seemed more willing to accept 
resolutions departing more widely from 
going rates, 


Favorability of Settlement to Company (Col- 
umn 5, Table 1): Results 


When deadlines were in operation (Ex- 
periments 1 and 3) settlements favoring 
the company were significantly (p < .05) 
more likely to be reached when negotiators 
had studied the issues beforehand rather 
than planned strategies. In Experiment 1, 
settlements averaged 74.2 for the 24 con- 
tracts between parties who had studied 
the issues and 49.0 for the four settled 
contracts between strategists. In Experi- 
ment 3, those who studied the issues (alone 
or in groups) yielded an average score of 
108.4 in favor of the company, which 
average was significantly greater (p < .05) 
than the 32.1 obtained for those who planned 
strategies before negotiating. 

Nevertheless, a complete and significant 
reversal occurred in Experiment 2, where 
there were no deadlines. Solutions favoring 
the company were most likely (p < .01) 
if the company representatives planned 
strategies (121.0), even more so when the 
company but not the union men planned 
strategies (125.7). Resolutions favored the 
union if both sides studied (40.2) or if the 
union planned strategies while company 
representatives studied beforehand (57.7). 

The grand mean for all 116 settlements of 


75.8 was clearly in the direction favoring the 
company as were these values for all 12 
treatments. This outcome may be an 
inherent game characteristic or a conse- 
quence of the modal attitude towards 
unions and management of the sample of 
256 graduate business students who served 
as negotiators for the three experiments. 


Specific Outcomes 


It was possible to do a more detailed 
analysis of the completed contracts of 
Experiments 2 and 3. Grand mean settle- 
ments of each issue and significant effects 
were as follows: 

Hospital and medical plan. The union 
had demanded that the company pay full 
cost; the company had refused to pay more 
than 25%. The mean settlement reached 
was identical in both Experiments 2 and 3. 
On the average, the company had to pay 
60.5 % of the cost. In Experiment 2, when 
union representatives studied the issue 
rather than planned strategies, the settle- 
ment was significantly (p < .05) more in 
their favor (65.7 % versus 55.2% to be paid 
by the company). 

Wages. The union demanded an increase 
of 16 cents per hour; the company had 
refused outright. The mean settlements 
called for a 7.4-cent increase in Experiment 
2 and a 6.2-cent increase in Experiment 3. 
'There were no significant treatment effects. 

Sliding pay to conform to cost of living. 
The union had demanded that the scale 
shift with a rise in the cost of living; the 
company had refused outright. Eighteen of 
48 contract settlements contained sliding 
scales in Experiment 2, while 22 of 36 
contained such scales in Experiment 3. 
There were no treatment effects. 

Seniority. The union had demanded 
continuance of plantwide seniority; the 
company proposed departmental seniority. 
Thirty of 48 contracts maintained plant- 
wide seniority in Experiment 2, while 22 
of 36 did so in Experiment 3. Here nego- 
tiators from a group that strategized before 
bargaining were most likely (p < .05) to 
maintain plantwide seniority (8 out of 9) 
while negotiators from a group that studied 
the issues were least likely (2 out of 9). 
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If alone before negotiating, strategy or 
study made no difference. In both of these 
cases, 6 of the 9 contracts maintained plant- 
wide seniority. 

Union representative on board of directors. 
The union had proposed this, and the com- 
pany had refused. None of the 48 settle- 
ments of Experiment 2 contained acceptance 
of this proposal and only 4 of the 36 con- 
tracts of Experiment 3 did so. 

Night-shift differential. The union had de- 
manded an extra 5 cents for night work; 
the company had refused. The mean settle- 
ment in Experiment 3 was 2.8 cents. There 
were no treatment effects in Experiment 3. 
In Experiment 2, the mean was 3.4 cents. 
Here the best settlement for the union at 4.4 
cents occurred when both sides had studied 
the issues (p < .01). The next best settle- 
ment occurred when both sides had planned 
strategies (3.8 cents). Company study in 
general led to higher settlements averaging 
3.9 cents in comparison to a mean of 2.9 
cents given up by company strategists 
(p < .05). 

Vacation pay. The union had wanted 3 
weeks vacation for 10 years service, re- 
quiring an increase of $5,000 annually in 
labor costs. The company had favored the 
status quo of 2 weeks with 1 year service, 
requiring no increase in labor costs. The 
mean settlement cost the company in Ex- 
periment 2 an increase of approximately 
$1,220 and a corresponding $1,130 in Ex- 
periment 3. There were no treatment effects. 

Work rules committee. The union had 
rejected establishment of a committee de- 
manded by the company. This committee 
of company, union, and industrial engineer- 
ing consultants was to be responsible for 
changes in work rules. In Experiment 2, 
20 of the 48 contracts called for establishing 
the committee, but 15 of these were con- 
tracts negotiated by company strategists 
and only 5 were contraets negotiated by 
company men who had studied the issues 
(p < .01). Experiment 3 was a complete 
reversal. Twenty-five of 36 contracts called 
for such a committee, But of these, 17 de- 
cisions favoring the committee occurred 
with the 18 contracts among negotiators 
who had studied the issues beforehand 


alone or in groups while only 8 of 18 came 
from those who had strategized before 
negotiating (p < .01). 

Checkoff system. The union had wanted 
the company to deduct dues from workers’ 
pay; the company had refused. In Experi- 
ment 2, 27 of the 48 settlements provided 
for introducing a checkoff system. Twenty- 
two of 36 agreed to the system in Experi- 
ment 3, most often (8 of 9) when negotiators 
had come from study groups; least often 
(3 of 9) when negotiators had studied alone. 
This compared to the acceptance rate of 
6 or 7 of 9 by negotiators who had planned 
strategies beforehand in groups or alone. 
(The interaction was significant at the 57% 
level.) 

In sum, when treatment did affect spe- 
cific outcomes, the effects were likely to be 
complex. Particularly sensitive to treatment 
effects were the settlements dealing with 
plantwide seniority, the night-shift differ- 
ential, the work rules committee, and the 
checkoff system. 


Agreement on Importance of Issues 


For each of the 128 pairs of bargainers, 
the correlations in their rankings of the 
importance of the nine issues were caleu- 
lated, then converted to Fisher's Z. The 
mean correlations for each of the 12 treat- 
ments are shown in Table 1, Column 6. 

Dealing with attitudes of bargaining pairs 
toward the nine contractual issues, these 
correlations probably are less affected by the 
presence or absence of deadlines, or the 
presence or absence of negotiating time 
costs. The correlations were more depend- 
ent on how and when the attitudes were 
shaped. As can be seen in Table 1, if both 
parties had been together in bilateral study 
groups, they were in the highest degree of 
agreement (.76) about the relative import- 
ance of the nine issues. Conversely, if one 
party to a negotiating meeting studied 
while the other planned strategies as in 
Treatments E and F of Experiment 2, 
least agreement was likely (.24 and .36). 
Finally, in all three experiments, those who 
planned strategies in groups (Treatments 
A, A’, and A”) were slightly more in agree- 
ment (.60, .50, .54) than those pairs of 
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negotiators who came from Treatments B, 
B', and B" unilateral study groups (.52, 
.46, .48). To sum up, agreement on the 
relative importance of issues by both parties 
is enhanced if the parties study the issues 
together beforehand. Less agreement occurs 
if each side plans strategies, and still less 
agreement is likely if each side studies the 
issues unilaterally. 

Evidently, while bilateral study fails to 
produce faster resolution of conflict in 
comparison to unilateral study, it does seem 
to commit its specific members from both 
sides to agreement about the order of impor- 
tance of issues. 

Each bilateral study group emerges with 
different schedules of commitment of its 
members. If negotiators come from the 
same bilateral study group, they are most 
likely to agree on what issues are important; 
if they come from different study groups, 
they have strong commitments to different 
orders and are most likely to disagree. 


TABLE 
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Interrelations among Contract Outcomes 


Table 3 displays the product-moment 
correlations among the four contract out- 
comes just discussed: harmonic mean ne- 
gotiating time, departure of settlement from 
going rate, favorability of settlement to the 
company, and agreement of the opposing 
parties on the relative importance of the 
issues. 

Certain consistent relations are apparent. 
The longer the parties spent in negotiating, 
the less likely were the contract settlements 
to depart absolutely from the going rates. 
Correlations were negative for all 12 treat- 
ments. If closeness of the settlement to the 
going rate is regarded as a measure of the 
quality of the settlement, then it is clear 
that speedy settlements were incompatible 
with the quality of the settlements. 

In Experiment 1, fast resolution favored 
the company (—.34), but a scattered 
pattern of results were obtained for Ex- 
periments 2 and 3, making any overall 
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IwTERCORRELATIONS FOR EACH TREATMENT AMONG DURATION OF STRIKE, DEPARTURE OF 


SETTLEMENT FROM Norms, AND AGREEM 


ENT ON Issues ron 116 SETTLEMENTS 


in. ü Departure I 
Number [Nesotating Nerodetine Nerone | iei, | DPPH [Favorability 
contracts | departure Eau agreement | ¢ rere ey ment vem versus h 
settled | settlement | company | On issues |t oup. io sues | m issues 
company 
Experiment 1 
A Plan strategy 4 —.99 —.20 —.76 .30 -76 BL 
B Unilateral study T —.04 —.09 — .32 .04 .22 —.928 
C Bilateral study (with future 
opponents) 8 —.06 —.58 .24 —.36 —.08 .21 
D Bilateral study (with others) 7 —.73 —.36 —.40 .22 .12 —.06 
All 26 —.54 — .34 —.35 .05 .30 —.05 
Experiment 2 
A' Plan strategy 12 —.44 .12 —.02 —.14 .28 — .36 
B' Unilateral study 14 —.01 .18 —.48 .55 —.82 .01 
E Companies plan strategy, 
unions study 12 —.28 —.72 .08 .66 .20 15 
F Companies study, unions 
plan strategy 13 —.05 .13 —.44 .43 .30 —.25 
All 51 —.19 —.12 —.32 .40 .19 —.12 
Experiment 3 
A" Plan strategy (in groups) 9 —.02 .23 .06 .64 —.40 —.20 
B" Unilateral study (in groups) 8 —.27 .39 —.44 .00 .09 —.02 
G Plan strategy (alone) 1l —.30 —.20 .01 .61 —.16 .01 
H Study (alone) il —.21 —.08 .09 .04 za —.84 
All 39 —.20 -10 —.08 .38 —.08 —.14 
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generalizations impossible about negotiating 
time and favorability of outeomes to the 
company. 

It would stand to reason that the more 
opposing sides agreed on the relative im- 
portance of the nine conflieting issues, the 
less time they would need to reach agree- 
ment on how to resolve the conflicts. (For 
instance, it might make it easier for them 
to formulate and accept package deals.) The 
expected negative correlations were obtained 
in seven instances; correlations close to 
zero (.08, .06, .01, and .09) were obtained 
for four treatments and only one sizable 
positive correlation (.24) appeared. The 
aggregate mean correlations for the three 
experiments were —.35, —.32, and —.08, 
respectively. It is inferred that agreement 
on the importance of issues was associated 
with speedy negotiations. 

Since all settlements tended to favor the 
company, it again seems reasonable for 
greater departures from the going rates to 
have coincided with better settlements from 
the company’s standpoint. For the three 
experiments, the mean correlations were 
.05, .40, and .38. However, their inconsistent 
scatter made interpretation difficult. Simi- 
larly, a haphazard pattern of correlations 
appeared between agreement on issues and 
departure of settlement and agreement on 
issues and favorability of outcome to the 
company. In sum, settlements closer to the 
going rate were more likely to be achieved 
if negotiations went more slowly. And such 
slower decisions were likely when negotiators 
initially were in less agreement about the 
importance of the issues. 


SUBJECTIVE RESULTS 


Ranked Importance of Issues 


As Table 4 shows, there was complete 
agreement as a whole by respondents from 
all three experiments on the order in im- 
portance of the first three issues and almost 
as much agreement on the ordering of the 
last two issues. Wages, sliding pay, and the 
hospital-medical plan were most important; 
the union representative on the company 
board and the checkoff were least. 

A question can be raised about the ap- 
plicability of the analysis of variance model 


TABLE 4 


RANKED Importance OF THE NINE BARGAINING 
Issurs IN THE THREE EXPERIMENTS 


Experiment 
Issue 
1 2 3 

Wages 1.52 1.72 1.49 
Sliding pay scale 2.52 3.53 2.96 
Hospital-medical 

plan 400 3.85 4.22 
Seniority 5.42 4.56 4.85 
Work rules commit- 

tee 6.31 5.30 5.11 
Night-shift differ- 


ential 4.88 5.61 5.95 
Vacation pay 6.19 6.20 6.13 
Union representa- 

tive on company 

board 6.52 6.14 6,41 
Checkoff 7.00. 7.94 7.82 


since data contributing to the means were 
influenced by grouping effects assuming 
that individual respondents were influenced 
by group discussions prior to negotiating 
and by common strategies to which they 
committed themselves. If only one group 
was represented in a cell, this would reduce 
the within-cell variance; if several groups 
were represented, it would probably serve 
to inflate the within-cell variance. 

Treatments had complex effects on the 
relative importance negotiators attached to 
some of the various issues. It is important 
to keep in mind the built-in negative rela- 
tions among the rankings of the nine issues. 
Tf one issue was ranked high, some other 
issue was forced to be ranked lower. 

Wages. In all three experiments this was 
the most important issue. For the three 
experiments, average ranks of 1.52, 1.72, 
and 1.48 were assigned to wages. In the 
first experiment, the assignment was un- 
affected by treatment or position as union 
or company representative. 

In Experiment 2, an interesting interac- 
tion significant at the 5% level appeared. 
Wages were judged relatively less important 
by both union and company respondents 
whenever they had similar prenegotiation 
experience and were judged more important 
when they had different prenegotiation 
experience. Table 5 shows the outcomes. 

In Experiment 3, two significant inter- 
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TABLE 5 


RANKED Importance OF VARIOUS Issues AS A FUNCTION OF PRENEGOTIATION EXPERIENCE 
IN EXPERIMENT 2 


48 union respondents 


48 company respondents 96 respondents 


Company Company 
studies 


ERS Com 


Company Con 
lana studies pans 


studies 


strategy strategy strategy 

Wages 

Union plans strategy 2.08 1.42 1.83 1.75 1.96 1.58 

Union studies 1.42 2.17 1.25 1.83 1.33 2.00 
Sliding pay scale 

Union plans strategy 2.83 4.33 3.50 3.00 3.17 3.67 

Union studies 6.17 3.33 3.33 1.75 4.75 2.54 
Seniority 

Union plans strategy 5.33 3.50 5.83 3.33 5.58 3.42 

Union studies 2.83 4.07 5.67 5.33 4.25 5.00 
Work rules committee 

Union plans strategy 4.92 5.15 4.08 6.00 4.50 5.88 

Union studies 5.00 4.92 4.75 7.00 4.88 5.96 
Night-shift differential 

Union plans strategy 6.83 4.75 6.42 4.92 6.03 4.83 

Union studies 4.08 6.92 5.00 6.00 4.54 6.46 
Vacation pay 

Union plans strategy 7.42 6.00 6.75 6.33 7.08 6.17 

Union studies 5.75 5.75 5.17 6.42 5.46 6.08 
Union tepresentative on board 

Union plans strategy 2.50 7.67 4.42 8.33 3.46 8.00 

Union studies 8.17 4.33 8.75 4.92 8.46 4.63 


actions emerged. As shown in Table 6, those 
who studied in groups or planned strate- 
gies alone ranked wages as significantly 
(p « .01) more important than those who 
studied alone or planned strategies in groups. 
Above and beyond this, company respond- 
ents who worked in groups prior to nego- 


TABLE 6 


RANKED Importance OF WAGES BY UNION AND 
Company RESPONDENTS AS A FUNCTION 
OF THEIR PRENEGOTIATION EXPERI- 
ENCE IN GROUPS OR ALONE 


Group Alone GERA 
Planned strategy 1.60 1.20 1.40 
Studied 1.20 1.95 1.58 
Company respondents 
only 1.25 1.95 1.60 
Union respondents 
only 1.55 1.30 1.42 


tiations and union respondents who worked 
alone regarded wages as more important. 

Sliding pay scale. 'This was the next most 
important issue in all three experiments. 
Mean ranks assigned were 2.52, 3.53, and 
2.96. While treatments had no significant 
effects in the first and last experiments, in 
Experiment 2 company respondents judged 
the issue significantly more important 
(p « .01) than did union respondents (2.90 
versus 4.17), particularly if the company 
men had studied the issues rather than 
planned strategies. Table 5 shows the effects 
of treatments on judgments here. 

'The triple interaction for Experiment 2 
was significant at the 5% level. Thus, the 
issue was judged most important (1.75) by 
company respondents who studied and ne- 
gotiated with opponents who studied; the 
jssue was seen as much less important 
(61.17) by union men who had studied but 
who bargained with company strategists. 


——— 
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Hospital and. medical plan. For the three 
experiments, this issue was next most im- 
portant, ranking 4.00, 3.85, and 4.22, 
respectively. This time there were no sig- 
nificant effects associated with the first two 
experiments, but in Experiment 3 union 
men clearly found this issue more important 
(p « .01) than did company respondents 
(3.27 versus 5.17). Those who had worked 
in groups judged the issue more important 
(p « .05) than those who worked alone 
(3.82 versus 4.62). Those who had studied 
in groups seemed partieularly concerned 
about this issue (p « .01) compared to 
those treated in all other ways. 

Seniority. This was judged fourth in 
importance in Experiments 2 and 3 and 
fifth in Experiment 1. Those who worked 
alone in Experiment 3 prior to negotiating 
saw this issue as more important (p < .01) 
than those who worked in groups (4.58 
versus 5.12). In Experiment 2, company 
respondents felt more strongly about the 
matter (p < .05) than union men (4.08 
versus 5.04), but the most sizable effect 
seemed a consequence of each side who 
bargained having had a different prenego- 
tiating experience. As seen in Table 5 for 
the 96 respondents as à whole, where they 
both had studied or both had planned 
strategies before negotiations, they saw the 
issue as less important (p < .01). 

Work rules committee. This was judged 
fifth in importance in the last two experi- 
ments and seventh in Experiment 1. In 
Experiment 2, as can be seen in Table 5, 
the issue was more important particularly 
for company men (p < .05) where they 
had planned strategies rather than studied. 

Night-shift differential. Judged sixth in 
Experiment 2 and 3 and fourth in Experi- 
ment 1, the union respondents regarded 
this differential as more important (p < .01) 
than company respondents in both Ex- 
periments 1 and 3 but no differently in 
Experiment 2. For the three experiments, 
the figures for union and company respond- 
ents were, respectively: 4.41 versus 5.34, 
5.65 versus 5.59, and 5.25 versus 0.65. 
Again, a complex interaction was significant 
(p « .01) for Experiment 2 as a function 
of treatments as Table 5 shows. The nig' *- 
Shift differential was more important 0 


negotiators whose prenegotiating experience 
had been different rather than similar. 

Vacation pay. Judged sixth, seventh, or 
eighth in importance, vacation pay was 
seen as more critical (p « .05) by all re- 
spondents when union men studied rather 
than planned strategies before negotiating 
(5.77 versus 6.63) but particularly (p « .05) 
when they had to deal with company strate- 
gists, Here, the mean judgment of all 
involved respondents was 5.46. 

Union representative on company board. 
This was the only issue in Experiment 1 
whose importance was significantly af- 
fected by treatment (p < .01). The pro- 
posal that a union representative sit on the 
company board was more important for 
strategy planners than those who studied 
the issues (5.94 versus 6.67). Evidently, 
this must have been an item pushed by the 
union during its strategy planning which 
raised its saliency to union strategists and 
the company strategists who had to bargain 
with them relative to its assigned importance 
by study-group bargainers. 

Table 5 shows the highly significant inter- 
action (p < .01) appearing in Experiment 2. 
The issue was seen as far more important 
for competing strategists (3.46) and for 
competing negotiators, both of whom had 
studied the issues (4.63), than for those who 
had been treated differently (8.46 and 8.00). 
In Experiment 3, the large effect (p < .01) 
was evidenced in the greater regard shown 
for the issue by company than union re- 
spondents (5.10 versus 7.72). 

Checkoff. This was judged least important 
in all three experiments but effects were 
mixed in each experiment. The union at- 
tached significantly (p « .01) more im- 
portance to it (7.13) than did the com- 
pany which ranked it 8.16 in Experiment 1. 
In Experiment 2, the reverse occurred with 
company respondents ranking the issue at 
7.61 and union respondents ranking it 8.27. 
In Experiment 3, no significant differences 
emerged in this response. 

In general, result judgments about the 
mportance of the issues were similar in all 
three experiments and were relatively un- 
affected by differential treatments. Wages, 
sliding pay scales and the medical plan were 
judged more important, while the proposals 
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for checkoff systems, union representation 
on the board and vacation pay were judged 
least important. 


Preferred Length of Time for the Contract to 
Run 


In the first and third experiments, re- 
gardless of treatments, company respond- 
ents favored significantly (p < .01) longer- 
running contracts than did union men: 
36.6 and 33.2 months for company men 
versus 22.8 and 21.8 months respectively 
for union men. However, the same differ- 
ence failed to appear in the second experi- 
ment. Here, a significant interaction effect 
materialized (p « .01) for respondents as à 
whole. Shorter contracts of 24.5 and 22.8 
months were favored when each side had 
had a different prenegotiation experience: 
union study-company strategy and union 
strategy-company study. When both had 
planned strategies or when both had studied, 
they favored longer contraets (39.0 and 
37.5 months respectively). 


Acceptability of Contract Settlement to One's 
Own Side 


How confident a negotiator was that his 
settlement was acceptable to his own side 
was significantly greater (p « .05) if he 
had been involved in Experiment 2 with 
negotiations with a company strategist or 
as a company strategist rather than with 
or as a company man who had studied the 
issues (2.83 versus 2.54). No other sig- 
nificant effects were observed in any of 
the three experiments. 


Estimate of Who Fared Better in the Settle- 
ment 


In Experiment 2, contrary to objective 
outcomes, company respondents felt the 
union came out best when the union had 
studied the issues (1.71), while the com- 
pany got the better deal when the union 
had planned strategies (2.09). Union re- 
spondents felt the opposite reporting that 
the union fared best when it planned strate- 
gies (1.83) and not as well when it studied 
the issues (1.92). This interaction was sig- 
nificant at the 5% level. 

In Experiment 3, company respondents 


felt that when they had worked alone before 
negotiating, the company got the best deal 
(2.35); when they studied or planned 
strategies beforehand in groups, the union 
came out best (1.75). Union respondents 
showed no significant differences as a func- 
tion of treatment from a union grand mean 
of 2.02. Moreover, this mean indicated that 
they felt a sense of equity in the outcome. 
Again, the overall interaction effect was 
significant at the 5% level. 


Defensibility of One’s Own Position 

In Experiment 2, the higher order inter- 
action was significant at the 5% level. Union 
men felt their assigned position was least 
defensible (1.33) when they and their op- 
ponents had planned strategies beforehand 
and most defensible when they had studied 
the issues while the company planned 
strategies (2.42). On the other hand, com- 
pany respondents felt least defensible when 
both they and their opponents had studied 
the issues (1.83) and most defensible when 
they had studied and the unions strategized 
(2.33). 

In Experiment 3, union men as a whole 
felt themselves to be in a significantly 
more defensible position (p « .05) than 
company respondents felt (2.30 versus 1.95). 
But beyond this, among both, men from 
prenegotiation group experience felt them- 
selves in significantly (p « .05) more de- 
fensible positions (2.05) than those who 
had studied or planned strategies alone 
beforehand (1.70). (Group support evidently 
inereased confidence in one's own position.) 


Congruence of Assigned Role to Own Be- 
liefs 

The company rather than the union role 
was more compatible in all three experiments 
with the beliefs of these business school 
subjects, but the differences in compati- 
bility were significant (and to a modest 
degree, p < .05) only in Experiment 3. 
In Experiment 2, the biggest effect (p < .01) 
was associated with the interaction of pre- 
negotiation treatments. Congruence was 
higher (2.83 and 2.96) when union role 
players studied and company role players 
planned strategies and vice versa. Con- 
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gruence was lower (1.83 and 2.17) when both 
sides planned strategies or studied before- 
hand. 


Quality of Opponent’s Performance as a 
Negotiator 


Union strategists of Experiment 2 judged 
their opponent's performance as significantly 
(p < .01) better (4.33) than did union men 
who had studied beforehand (4.02). In 
Experiment 3, opponents were judged more 
favorably by union men who had worked 
alone (4.6) and by company strategists who 
had worked alone (4.7). They were judged 
signifieantly (p « .05) less favorably by 
union men with group experience (4.10). 


ORIENTATION OF THE BARGAINERS 


Self-, interaction-, and task-orientation 
Scores (Bass, 1962) were available for the 
negotiators of Experiments 1 and 3. To 
see the extent orientation of the bargaining 
team affected contract outcomes, the sums 
of self-orientation scores for each bargaining 
pair and the differences between the self- 
orientation scores for each bargaining pair 
were calculated. The same was done for the 
interaction- and the task-orientation scores. 
Then, for each treatment, these six values 
were intercorrelated with contract outcomes. 

Again, consistent patterns were sought 
since the likely significance of a single cor- 
relation based on eight cases is quite low. 


Task Orientation and Negotiating Behavior 


For the eight treatments, A, B, C, D, 
A”, B^, G, and H, the correlations between 
the combined: task-orientation of the pairs 
of negotiators and the departure of the 
settlement reached from the going rate were 
respectively: —.49, —.28, —.55, —.71, .06, 
34, —.44, and —.49. On the other hand, 
the differences in task orientation of the 
pairs correlated respectively: .27, .14, 28, 
78, 44, —.11, .17, and —.35 with this 
departure. Thus, task-oriented negotiators 
tended to reach settlements closer to the 
going rate. (Interestingly enough, Campbell 
(1960) regards closeness to the going rate 
as a criterion of the quality of the resolution.) 
But the two negotiators were less likely to 
reach settlements closer to the going rate, 


if they differed widely from each other in 
task-orientation scores, 

At the same time, consistent with the 
Meaning of task orientation, under most 
treatments, pairs of task-oriented nego- 
tiators with high combined task-orientation 
Scores tended to regard vacation pay as less 
important by assigning the issue a lower 
rank, that is, 9, 8, 7, 6 rather than 1, 2, 3, or 
4. Therefore, the generally positive correla- 
tions obtained of .20, .21, .55, .35, .14, .00, 
.80, and .00 showed high task-orientation 
among negotiators to coincide with the 
assignment of low importance to the issue. 


Self-Orientation and Negotiating Behavior 


When the negotiating pair was high in 
self-orientation according to their combined 
scores, they appear to agree slightly more on 
the importance of the nine issues. The cor- 
relations between combined self-orientation 
and the agreement of pairs of negotiators 
for the eight treatments were, respectively, 
.22, .46, .18, .28, .41, .11, —.08, and .26. 

When combined self-orientation was high 
among negotiators, they also tended to 
attach less importance to the work rules 
committee. Correlations for the eight treat- 
ments were: .52, .48, .30, —.56, .00, .23, 
.20, and .76. Likewise, they attached less 
importance to the checkoff system; correla- 
tions for the eight treatments were, re- 
spectively, .13, .12, .21, .12, .07, .62, .12, 
and .46. 

Consistent with their self-concerns, under 
seven of eight treatments, self-oriented ne- 
gotiators ranked the hospital and medical 
plan as more important an issue. Correla- 
tions were —.15, —.21, —.13, —.18, —.67, 
—.17, —.20, and .39 between the rank 
assigned the issue and combined self-orien- 
tation scores. 

Self-oriented negotiators seemed to show 
different patterns as a consequence of 
whether they were from a strategy planning 
group or not. In Experiment 1, strategy 
pairs with high  self-orientation scores 
reached settlements more favorable to the 
company (.32) at lower costs to the com- 
pany (47), but self-oriented negotiators 
from study groups did the reverse. For them, 
correlations between self-orientation com- 
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bined scores correlated —.33, —.69, and 
—.40 with degree of favorableness of the 
settlement for the company, and the com- 
bined scores correlated —.31, —.74, and 
—.18 with eosts of the settlement to the 
company. In line with these results, in 
Experiment 3, the combined self-orienta- 
tion of negotiators correlated .52 with 
favorability of settlement to the company 
for negotiators who planned strategies alone 
and .22 for negotiators who planned strate- 
gies in groups. Yet, the correlation was 
—.63 for negotiators who studied alone and 
.07 for those who studied in groups before- 
hand. 

Differences in the self-orientation of nego- 
tiations were not related in any consistent 
way with negotiating outcomes. Likewise, 
neither the combined interaction-orientation 
scores for each pair of negotiators nor the 
differences between them, showed consistent 
correlations with any negotiation outcomes. 


POSTSESSION CRITIQUE 


Pairs who had deadlocked and those who 
reached agreement quickly were queried 
about the reasons for their outcomes. 

Some deadlocked partisans said they were 
unconcerned about the length or cost of the 
strike. Others argued that there was no 
reason to settle for less than the going rate 
since they assumed that workers could get 
jobs elsewhere. 

One told of his negotiating process which 
seemed prone for failure. He would only 
negotiate on a one-for-one basis and would 
not give anymore than he felt he received 
in swapping issue by issue. 

Two negotiators, who deadlocked and sat 
back-to-back in stony silence for the last 
part of the negotiating session, did not need 
to comment about their emotional involve- 
ment; but another deadlocked partisan felt 
he had been carried away by the role and 
had behaved completely realistically. 

Early success in attaining what is re- 
garded as an important concession and 
concern about the cost of the strike pro- 
duced early settlement. 


... we agreed almost immediately on a 6-cents- 
an-hour increase, which my group had figured 
was the most important matter. On the less im- 


portant items, I put a premium on time and did 
not worry about pennies. I figured it was better 
to remain below the going rate but settle the strike 
quickly. 


An agreeable opponent also helped: 


. . » Westarted out by going through the few things 
that I was going to really build up as something 
big to trade on, and he went right through (the 
list of issues) and just gave them to me right away 
and so I just let him keep going. . . . 


One pair seems to have exemplified Os- 
good’s (1962) gradualism without knowing 
it. First, one negotiator made a small con- 
cession. This was followed by one from the 
other and so on down the line until full 
agreement was quickly reached. Yet, an 
astute negotiator commented: 


One thing I noticed (with strategy groups) was 
you could begin to detect after a while what their 
strategy was. For example, they would concede 
the smaller issue and skip over the most im- 
portant, wages, and then hopefully come back 
later and use the argument, “oh, since we gave 
you that, how about... .” 


Postsession Interviews 


It was possible to complete individual 
postexperiment interviews with 39 of the 46 
negotiators who had planned strategies for 
Experiment 3. 

'The two most popular methods of ap- 
proaching the problem of planning strate- 
gies were either first to rate the issues for 
trading off (N = 10) or first to compare 
issues with community norms (N = 9). 
Four solitary planners began by rating 
issues according to costs, but no groups 
did this. 

Of the 32 who reported having a primary 
goal in mind, 11 of the 15 union planners 
were aiming to maximize monetary gains, 
while only 7 of the 17 company planners 
were primarily out to minimize monetary 
costs. Eleven of 12 solitary planners with 
primary goals were concentrating on mone- 
tary gain while only 7 of 20 group planners 
focused first on money. Other primary goals 
distributed relatively evenly among the 
various planners included a settlement near 
the community average, bettering long- 
range relations between parties and im- 
mediately ending the strike. 
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While 15 of 16 company negotiators indi- 
cated that they had consciously considered 
what the union’s goals might be, only 6 of 
13 union negotiators had considered the 
company’s aims. Twelve of 17 planning 
alone worried about what the other side 
was after; again, 9 of 12 in groups reported 
considering what the other side’s goals might 
be. 

Six company negotiators saw themselves 
pushing for a package deal; only one union 
man took this bargaining approach, Seven 
union negotiators said they had worked to 
trade off issues; only four company men 
were interested in applying this strategy. 
Other less frequently employed strategies 
included: arranging a priority of issues to be 
discussed (N = 6): minimax solutions 
(N = 4); pursuing one issue at a time 
(N = 2); finding the other party’s opinion 
first (V = 2) and attacking the other side 
N = 1). 

Only 5 union and 1 company negotiator 
of the 39 were definitely convinced that the 
nonmonetary issues were very important. 

While two-thirds of company negotiators 
felt committed to their prebargaining strat- 
egy, only half the union respondents were 
so committed. As might be expected, two 
of every three strategists coming from groups 
felt such commitment while only one of 
every two strategists who had planned 
alone felt similarly committed. Approxi- 
mately the same ratios held in response to 
whether the actual bargaining proceeded 
according to plan or had to be revised. 
While 13 of 15 company negotiators would 
use the same strategy again, only 7 of 12 
union bargainers would do so. Among those 
who had planned alone, there was a 12 to 1 
preference to shift to planning strategies 
in groups while of those who had planned 
in groups, only half would prefer to switch 
to planning strategies alone. 


CONCLUSIONS 


Strategy versus Study Experience 

The variety of contract outcomes points 
to the importance of focusing on how nego- 
tiators prepare themselves in advance of 
bargaining. Negotiators who had planned 


strategies in groups rather than studied the 
issues in groups were more likely to dead- 
lock in the face of deadlines, but they could 
also achieve speedy settlements if the 
Strategies they formulated happened. to 
overlap with those of their counterparts. 

Speedy settlements were not necessarily 
good settlements. In fact, if Campbell’s 
(1960) criterion is accepted that the quality 
of the settlement is given by the closeness 
of the outcome to community norms, then 
speed was inversely related to quality under 
all 12 treatments. In all treatments, nego- 
tiating time was negatively correlated with 
departure of settlement from going rates 
in the community. 

More tightly controlled experiments can 
be effected here, possibly yielding a better 
and simpler criterion of quality by eliminat- 
ing the nonmonetary issues from the nego- 
tiations and suggesting nonoverlapping 
monetary goals to the conflicting parties, 
that is, $60,000 gain for union, $35,000 cost 
to company. (A pilot study with 85 con- 
tracts negotiating only monetary issues 
yielded a mean settlement of $49,638.) 

In comparison with settlements by those 
who had studied the issues, settlements by 
strategists departed more widely from the 
going rates in the first two but not the third 
experiment. Prenegotiation study favored 
better settlements for the company in the 
first and third, but not the second experi- 
ment. However, in all three experiments, 
strategists tended to agree more with their 
opponents in the rank importance of the 
issues than did those who studied the issues 
unilaterally in advance, Again, the complex 
outcomes appear to call for simplifying 
required negotiations. Nevertheless, maxi- 
mum agreement on the relative importance 
of the issues can be affected by providing 
prenegotiation experience in bilateral study 
groups with one’s future opponents and 
avoiding bilateral study groups with counter- 
parts with whom one will not subsequently 
have to negotiate. And such agreement in 
advance seemed to speed negotiations some- 
what, although it had no consistent associa- 
tion with the direction of the settlement. 

There were relatively few very specific 
differences resulting from prenegotiation 
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experience in study or strategy planning in 
groups. Studying in groups seemed to result 
in more decisions for departmental rather 
than plant-wide seniority and greater im- 
portance being attached to the hospital and 
medical plan. Strategy planning resulted in 
greater importance being assigned to the 
proposed union representative on the 
company board. 

Rather than pursue this question about 
the differential effects of study rather than 
strategy on contract outcomes, our mixed 
bag of results and the critiques that follow 
suggest that it may be more profitable to 
move into an examination of specific tactics 
which may increase the speed of contract 
resolution. and/or enhance the quality of 
outcomes. Requiring resolution of only the 
five monetary issues, different samples of 
prenegotiation strategists can be asked to 
plan in groups as follows: 


1. Aim to obtain a settlement at the 
community norm 
2. Minimize cost (or maximize gain) 
3. Develop tactics on estimates of the 
opposing party’s goals 
4, Use as a strategy: 
a. Package deal 
b. Trade offs 
5. Begin negotiating with: 
a. The most important issue 
b. The least important issue 
c. By a search of the opposition’s 
views. 


Group versus Individual Prenegotiation Prep- 
aration 


Group commitment cumulated with stra- 
tegic thinking to hold bargainers to the 
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longest decision times. Group experience 
produced deadlocks which never occurred as 
a consequence of individual preparation. 
Opposing parties who had met beforehand 
in unilateral groups were in greater agree- 
ment about the importance of the issues 
(yet took longer to reach contractual agree- 
ment) than those who had prepared alone. 
Those who prepared in groups felt they had 
more defensible positions than those who 
prepared individually. Most other specific 
variables were not appreciably affected al- 
though group preparation raised the saliency 
of the hospital-medical plan and lowered 
the perceived importance of the seniority 
question. 

In short, the reinforcing properties of 
group experience were revealed. As a general 
procedure, opposing positions of negotiators 
can be readily hardened by prenegotiation 
reinforcement in unilateral groups. 


Personal Orientation 


When task-orientation is high among 
negotiators, settlements are closer to going 
rates—which may be interpreted as illus- 
trative of high-quality resolution; but when 
one negotiator is high and the other low in 
task orientation, the reverse is true about 
contract outcomes. It would seem profitable 
to follow up this finding with a replication 
with a larger sample and with a more sim- 
plified set of negotiations to yield a simpler 
monetary criterion of quality for contract 
outcomes. 

Such a simpler negotiation problem has 
been formulated. Results obtained with 
it will be the subject of subsequent re- 
ports. 
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ANALYSES OF VARIANCE OF THE Errecrs ON CONTRACT OUTCOMES or WHETHER UNION 
AND/OR COMPANY PLANNED STRATEGIES OR STUDIED IssuES BEFORE NEGOTIATING 


(ExPERIMENT 2) 


Treatment 
Dependent variable (i) Strat G) Strate ix) Error Total variance 
varus ily vais ly  Duercdon 
‘union company effect 
1df idf idf A df 47 df 
Duration of strike* 85.44 
Mean squares 32.01 24.65 10.08 9.51 
F ratio 3.36 2.59 1.06 
Departure from going rate 547881. 
Mean squares 6984. 581. 8086. 12096. 
F ratio 977 .005 .668 
Settlement favors company 400182. 
Mean squares 2146. 62424. 196. 7623. 
F ratio .282 8.189** .025 
Agreement on issues 5.098 
Mean squares .3798 .0150 .0728 .1052 
F ratio 3.61 .143 .692 


a These data exclude the deadlocks. When deadlocks are included in the analyses, converted to 
reciprocal of time with the assumption that the deadlock was infinitely long in duration so that 1/ = 


0, the attained mean variance in harmonie mean time among the four treatments was significant at the 


5% level. 
** 5 « 01. 
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TABLE A2 
ANALYSES OF VARIANCE OF THE EFFECTS ON SPECIFIC SETTLEMENTS OF WHETHER UNION 
AND/OR Company PLANNED STRATEGIES OR STUDIED IssuEs BEFORE NEGOTIATING 
(EXPERIMENT 2) 


Treatment 
i (i) Strat (j) Stra ix; 4 1 
d bl x E Total 
Dependent variable mM dy yersus stu dy Interaction rror otal variance 
union company 
1df 1df 1 df a df 41 df 
Hospital-medical 17.67 
Mean squares .08 .00 2.08 .35 
F ratio .23 .00 5.91* 
Wages 45.98 
Mean squares .19 .52 3.52 .94 
F ratio .20 .55 3.71 
Sliding pay 11.25 
Mean squares 75 .33 .33 .22 
F ratio 3.36 1.49 1.49 
Seniority 11.25 
Mean squares .08 75 75 +22 
F ratio .38 3.4 3.4 
Union representative* 
Night-shift differential 155.81 
Mean squares 22.69 11.02 1.69 2.73 
F ratio 8.29** 4.03* .62 
Vaeation pay 29.98 
Mean squares .02 1.69 1.02 27.25 
F ratio .03 2.72 1.65 
Work rules committee 11.67 
Mean squares .08 2.08 .33 .21 
F ratio .40 10.00** 1.60 
Checkoff 11.81 
Mean squares 52 .02 18 .25 
F ratio 2.07 .08 -74 


Birre trol AE ES A 
a No variance in response as a function of treatment. 
*p < .05. 
> <0, 
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TABLE A3 
ANALYSES OF VARIANCE OF THE EFFECTS ON SATISFACTION WITH CONTRACTS AS A FUNCTION 
or WHETHER UNION AND/OR COMPANY PLANNED STRATEGIES OR STUDIED ISSUES 

BEFORE NEGOTIATING (EXPERIMENT 2) 
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Treatment 
Dependent variable ICI G) Strat axi) Error Total variance 
versa study eros suey teraction 
to union company 
1df 1df 1df 92 df 95 df 
Preferred length? 16583. 
Mean squares 5133. 63.37 .3750 123.7 
F ratio 41.47** 512 .003 
Other negotiator's performance? 33.98 
Mean squares .0938 1.260 2.344 .3293 
F ratio .285 3.828 7.118** 
Acceptable contract? 30.62 
Mean squares .3750 2.0417 0417 8062 
F ratio 1,225 6.668* .1360 
Who fared better? 27.24 
Mean squares .5104 .0938 .2604 .2867 
F ratio 1.780 .3270 .9080 
Defensible position? 37.83 
Mean squares 6.000 .1667 .0000 .9442 
F ratio 17.43 .4840 .0000 
Congruence of role? 71.74 
Mean squares 19.26 1.260 .2604 .5539. à 
F ratio 34.77** 2.276 .470 


_ Patio! (n IN eine a S EROS 


*2-« .05, 
** p < 0l. 


26 


Brrnarp M. Bass 


TABLE A4 


ANALYSES OF VARIANCE OF THE EFFECTS ON JUDGED RANK Importance OF ĪSSUES AS A 


FUNCTION OF WHETHER UNION AND, 


/on CowPANY PLANNED STRATEGIES OR STUDIED 
Issues BEFORE NEGOTIATING (EXPERIMENT 2) 


Treatment 
Dependent variable (i) Strateg: G) Strat ax) Error Total variance 
vim study worn study Interaction 
union company 
1df 1df 1df 92df 96 df. 
Hospital-medical 243.9 
Mean squares 5.042 .3750 1.042 2.581 
F ratio 1.953 .145 .403 
Wages 125.4 
Mean squares 6.510 .5104 .2604 1.284 
F ratio 5.07* .398 .203 
Sliding pay 419.9 
Mean squares 44.01 17.51 1.260 3.881 
F ratio 11.34** 4.51* .325 
Seniority 429.6 
Mean squares 51.04 12.04 3750 3.980 
F ratio 12.82** 3.02 .094 
Union representative 807.2 
Mean squares 420.8 3.01 15.84 4.647 
F ratio 90.6** .648 3.41 
Night-shift differential 270.7 
Mean squares 82.51 .0937 1.260 2.031 
F ratio 40.6** .046 .620. 
Vaeation pay 287.2 
Mean squares 14.26 .5104 17.51 2.771 
F ratio 5.14* .184 6.32* 
Work rules committee 370.2 
Mean squares .5104 36.26 1.260 3.611 
F ratio .141 10.04** -3490 
Checkoff 197.6 
Mean squares 13.50 .6667 .3150 1.990 
F ratio 6.78* .935 .188 
* p < .05. 


** » < 01. 
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TABLE A5 
ANALYSIS OF VARIANCE OF THE EFFECTS on CONTRACT OUTCOMES OF WHETHER UNION 
AND COMPANY PLANNED STRATEGIES OR STUDIED IssUES IN GROUPS OR ALONE 
BEFORE NEGOTIATING (EXPERIMENT 3) 


Treatment 
Dependent variable (i) Study versus G) Group xi) Error Total variance 
strategy versus alone Interaction 
14f idf 1df 32 df 35df 
Duration of strike* 351.79 
Mean squares 13.44 31.21 44.10 10.75 
. F ratio 1.25 2.90 2.52 
Departure from going rate 307271. 
Mean squares 35094. 84. 14240. 8034. 
F ratio 4.37* .105 1.77 
Settlement favors company 483510. 
Mean squares 5244. 2093. 203. 1275. 
F ratio 4.11 1.64 .159 
Agreement on issues 3.049 
Mean squares .0072 .1034 .0078 .1103 
F ratio .065 .937 071 


a See Footnote a of Table Al. Here the treatment effect was significant at the 1% level when har- 


monic means were calculated. 
* p < 05. 
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TABLE A6 


ANALYSIS OF VARIANCE OF THE EFFECTS ON SPECIFIC SETTLEMENTS OF WHETHER UNroN 
AND COMPANY PLANNED STRATEGIES OR STUDIED IssuEs IN GROUPS OR ALONE 


BEFORE NEGOTIATING (EXPERIMENT 3) 


Treatment 
Dependent variable (i) Study versus (j) Group x3) Error Total variance 
strategy versus alone Interaction 
1df 1df 1df 32df 35 df 
Hospital-medical 10.75 
Mean squares .69 -03 .03 81 
F ratio 2.23 97 97 
Wages 81.56 
Mean squares an 4.00 1.00 2.39 
F ratio .05 1.67 .42 
Sliding pay 8.56 
Mean squares ll $11 he .26 
F ratio .42 .42 42 
Seniority 8.56 
Mean squares 1.00 ll 1.00 .20 
F ratio 5.00* 55 5.00* 
Union representative* 
Night-shift differential 93.64 
Mean squares 6.94 6.94 8.03 2.03 
F ratio 2.64 2.64 3.05 
Vacation pay 26.75 
Mean squares +25 .03 .69 .81 
F ratio .31 .04 .85 
Work rules committee 7.04 
Mean squares 2.25 .03 -03 17 
F ratio 13.24** 
Checkoff 8.56 
Mean squares .00 .44 1.00 .22 
F ratio .00 2.00 4.55* 


* No variance as a fraction of treatment. 


*p»« .05. 
** p» « 01. 


| 


TABLE A7 


Errzors or PLANNING oN GROUP NEGOTIATION 


I ANALYSIS OF VARIANCE OF THE EFFECTS ON SATISFACTION WITH CONTRACTS AB A FUNCTION 
or WHETHER UNION AND Company PLANNED STRATEGIES OR STUDIED ISSUES IN 


GROUPS OR ALONE BEFORE NEGOTIATING (EXPERIMENT 3) 


| 


Treatment 
Dependent variable Interactions Error greta 
com 
respondent (iX j) G Xk) |GXi X k) 
laf 1df idf 72 df 79 df 
Preferred length? 13736 
Mean squares 405.0 16.2 2 19.0 
F ratio 3.40 14 90* 
Other negotiatior’s 
performance? 25.55 
Mean squares .450 1.250 | 1.250 .272 
F ratio 1.65 4.60* .60* 
Acceptable contract? 43.89 
.1125| .6125|  .0125| .0125| 1.5125)  .5480| 
.85 1.12 .023 | 2.76 
30.89 
25| .1125| 2.1125| .3125|  .1125|  .3708 
.80 843 30 
42.75 
.4500|  .0500|  .0500| 1.2500)  .0500|  .4889 
K .102 . 2.56 .102 
Consequence of role? 46.80 
Mean squares .0000| .2000) .0000| .2000| .8000)  .5778 
F ratio .00 -346 . .946 | 1.384 .946 


*p < .05. 
*9 5 < 01. 


| 
i Mean squares 
F ratio 
Who fared better? 
Mean squares 
F ratio 
j Defensible position? 
| Mean squares 
| F ratio 
| 
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TABLE A8 
ANALYSIS OF VARIANCE OF THE EFFECTS ON JUDGED RANK ÍMPORTANCE OF ÍSSUES AS A 
FUNCTION oF WHETHER UNION AND COMPANY PLANNED STRATEGIES OR STUDIED 
ISSUES IN GROUPS OR ALONE BEFORE NEGOTIATING (EXPERIMENT 2) 


Treatment 
] ; p i Total 
Dependent variable i Dy c9. a vios Interactions Emor | void. 
versus versus company |a vr A - axix 
strategy alone | respondent | (X 2 | GX k) | G Xk) ij 
1df 1df 1df idf | 1df 1df 1df 72 df 79 df 
Hospital-medical 288.0 
Mean squares 7.220 .1800 | 1.280 | .0450| .6050| 2.645 . 0800 .2325 
F ratio 31.05**|  .77 5.51* | .19 | 2.60 |11.38** .34 
Wages 62.0 
Mean squares .9125|  .6125 .6125 | .3125| 3.613 | 6.613 .0125| .6931 
F ratio .45 .88 .88 .45 |15.5 9.54**| .02 
Sliding pay 284.9 
Mean squares 55.13 | 10.13 36.13  |3.125 | 1.125 |36.13 36.13 | 32.10 
F ratio 1.72 .32 1.12 -10 .85 | 1.12 | 1.12 
Seniority 300.2 
Mean squares 11.25 | 11.25 60.50  |7.200 | .2000/24.20 .050 | 3.333 
F ratio 3.38 3.38 1.82 (2.16 .06 | 7.26**| .02 
Union representative 539.4 
Mean squares 137.8 .6125 | 2.813 2.813 | 7.813 | 1.513 | 5.513 | 5.285 
F ratio 26.07** .01 .53 .53 | 1.48 -29 | 1.04 
Night-shift differential 223.8 
Mean squares 39.20 .8000 | 1.800 9.800 | .000 | 7.200 | .2000| 2.289 
F ratio 17.13** 35 -79 14.28 | 0.0 3.15 .09 
Vacation pay 169.5 
Mean squares 2.813 +0125 -6125 | .1125} .1125| 3.613 | 9.112 | 2.126 
F ratios 1.32 .01 .29 .05 .05 | 1.70 | 4.29 
Work rules committee 435.0 
Mean squares 52.81 5.513 3.613 [4.513 |15.31 ]|12.01 |23.11 4.432 
F ratios 11.92**| 1.24 82 |1.02 |3.45 | 2.71 | 5.21 
Checkoft 211.6 
Mean squares 5.000 | 3.200 4.050 | .0500| .8000| .2000| 8.450 | 2.636 
F ratios 1.90 1.21 1.54 02 | 3.04 76 | 3.21 
sp < 05. 
sep < OL. 
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THE CLASSIFICATION OF CHILDREN’S 
PSYCHIATRIC SYMPTOMS: 
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Symptoms from the case histories of 300 male and 300 female child psychi- 
atric patients were analyzed, separately for each sex, by the principal-factor 
method with quartimax, varimax, and oblimin rotations. Classification of 
the Ss according to the 1st principal factor and the reliable rotated factors 
showed that symptom clusterings at 2 levels of generality were present: 
there was a general polar dichotomy given the label Internalizing versus 
Externalizing, and there were several specific syndromes, some resembling 
traditional psychiatric diagnoses and some peculiar to certain develop- 
mental stages. Biographical differences among the Ss suggested that the 
Internalizing-Externalizing dichotomy and those specific syndromes sub- 
sumed by it reflected differences in socialization, while the syndromes not 
subsumed by it did not reflect socialization differences. The factors obtained 


can be used to classify child psychiatric patients for research purposes. 


HE advent of modern psychiatry was 

heralded in part by the introduction 
in 1883 of Kraepelin’s diagnostic system. In 
its earliest forms, this system rested on the 
assumption that all mental disorders were 
due to physical pathology in the brain. 
Kraepelin’s goal was to devise categories of 
disorder based upon descriptions of the 
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symptom syndromes manifested and upon 
the observed courses of illness. It was ex- 
pected that medical research would even- 
tually uncover different physical etiologies 
for the different categories thus formulated. 

Although Kraepelin himself soon focused 
his interest upon psychological processes 
and appears by 1900 to have abandoned the 
dogma that all mental disorders were due 
to brain dysfunction, the medical disease 
model has continued to represent one pos- 
sible format for psychiatric diagnosis. It 
would appear that, for mental disorders 
having some specific physical agent or mal- 
function as the necessary, sine qua non, 
cause, the orthodox disease model is indeed 
the appropriate one. Like other organic dis- 
eases, such disorders should ultimately be 
defined and treated with reference to the 
physical etiology. 

A second possible model for some cate- 
gories of mental disorder is illustrated by 
the conjectures of Rado (1953) and Meehl 
(1962) on the nature of schizophrenia. Ac- 
cording to this model, certain inherited or 
constitutional anomalies are the necessary 
but not sufficient causes of the clinical dis- 
order. Given the prerequisite anomaly, the 
typical social learning regime will result in 
a personality organization which is suscep- 
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tible to clinical schizophrenia. Given other 
constitutional weaknesses and/or serious 
psychological stresses, the schizotypie per- 
sonality can potentially manifest a clinical 
schizophrenic reaction. The disorder is thus 
the product of the interaction between a 
specific constitutional attribute, the learn- 
ing history resulting when an individual 
with such a constitutional attribute is sub- 
jected to the usual environmental regime, 
and the presence of precipitating stress. 

A third possible model for mental dis- 
orders is that of the behavior theorists. 
They (eg., Bandura, 1964; Bandura & 
Walters, 1963) maintain that behavioral 
deviations should be regarded as learned 
reactions rather than as symptomatic “dis- 
ease" manifestations. According to this posi- 
tion, symptomatic behavior is to be ex- 
plained in terms of the social conditioning 
of the specific behavior observed, rather 
than by recourse to medical analogies. 

It is conceivable that these three models 
validly represent three different types of 
mental disorder. In order to discover cate- 
gories of disorders for which each of the 
types of model might be appropriate, it 
would be useful to have a classification sys- 
tem which provides operationally defined 
categories and which can at the same time 
play a heuristic role for further research. 
This would be especially true for disorders 
having no known physical etiology, since 
those for which a physical etiology can now 
be identified are adequately defined by that 
etiology in terms of the disease model. 

Disorders without a known physical eti- 
ology are currently lumped together in the 
“functional” category. Within this category, 
psychiatric usage tends to combine the dis- 
ease "sign" approach of organic diagnosis 
with behavioral and dynamic concepts in 
confusing and inconsistent ways. Classifi- 
catory research is needed to differentiate 
functional disorders according to different 
conceptual models. The isolation of clusters 
of empirical attributes is a logical first step 
in this direction, and this has frequently 
been approached by intercorrelating psy- 
chiatric symptoms. Among the most exten- 
sive studies of this kind? was that by Wit- 


3 Symptom studies which were designed to test 


tenborn, Holzberg, and Simon (1953). With 
the help of several psychiatrists, they con- 
structed 55 symptom scales, each of which 
contained three or four objectively de- 
scribed behaviors. The symptoms occurring 
in a sample of state hospital patients were 
recorded by psychiatrists and factor an- 
alyzed by the centroid method, with an 
orthogonal rotation. Nine clusters, all sim- 
ilar to traditional diagnostic categories, 
were found and were given the following 
labels: acute anxiety, conversion hysteria, 
manic state, depressed state, schizophrenic 
excitement, paranoid condition, paranoid 
schizophrenic, hebephrenie schizophrenic, 
and phobic compulsive. 

With a sample of psychiatric referrals 
which included some jail and outpatient 
cases as well as state mental hospital pa- 
tients, Phillips and Rabinovitch (1958) used 
a different technique for isolating symptom 
clusters and obtained results different from 
those of Wittenborn. They checked the 
presence or absence of 46 different present- 
ing symptoms listed by a psychiatrist or 
referring physician in the case histories of 
patients. By grouping together all symptoms 
which were shown by chi-square analysis to 
be positively interrelated, they found three 
symptom clusters. Symptoms not statisti- 
cally related to the clusters were assigned 
by rational analysis and the clusters were 
interpreted as representing patterns of (a) 
avoidance of others; (b) self-indulgence 
and turning against others; and, (c) self- 
deprivation and turning against the self. 
These three clusters were confirmed in a 
new sample of case histories. 

Guertin (1952) recorded the presence or 
absence of 77 symptoms in each of 100 
hospitalized schizophrenics. An unrotated 
centroid analysis of the 52 most frequent 
symptoms produced six factors. These were 
labeled Excitement-Hostility, Retardation 
and Withdrawal, Guilt-Conflict, Confused 
Withdrawal, Persecutory-Suspicious, and 
Personality Disorganization. Except for the 
presence of psychotic symptoms in all of 


specific hypotheses (e.g. those of Lorr and his 
colleagues, cf. Lorr, Klett, & McNair, 1963), or 
which attempted other than empirical analyses of 
a broad range of functional symptomatology will 
not be considered here. 
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them and the greater specificity of symptom 
categories, the first three factors resemble 
the three clusters found by Phillips and 
Rabinovitch. 

Empirieal studies of child symptom 
groupings have followed somewhat similar 
ines. Hewitt and Jenkins (1946) recorded 
the presence or absence of 94 symptomatic 
traits occurring in each of 500 child guid- 
ance clinic case histories. Forty-five of the 
94 traits, chosen either because of high 
requency or “obvious clinical importance,” 
were cross-tabulated with the whole series 
of traits. By inspection, three clusters of 
raits resembling three behavior syndromes 
previously suggested by a committee of con- 
sultants were found. From the 10 to 12 
traits in each cluster, 6 or 7 were chosen to 
‘orm the final three clusters by which cases 
were to be classified. For each cluster, traits 
were chosen which correlated at least .30 
with most of the other traits in the cluster 
and which fit the clinical picture suggested 
by the cluster. The clusters thus formed 
were interpreted as representing the Over- 
inhibited Child, the Unsocialized Aggressive 
Child, and the Socialized Delinquent Child. 

In a recent study, Dreger (Dreger, 1964; 
Dreger, Lewis, Rich, Miller, Reid, Overlade, 
Taffel, & Flemming, 1964) had parents or 
parent surrogates of child clinic patients 
sort 229 discrete behavioral items according 
to whether or not the child had manifested 
them in the previous 6 months. The inter- 
correlations of 142 behavior items and de- 
mographie variables were analyzed by the 
principal-factor method with an oblimin 
rotation of the 10 largest factors. The ro- 
tated factors were left unlabeled, but their 
descriptions suggest that some of them re- 
semble the general clusters found in the 
Phillips-Rabinovitch, Guertin, and Hewitt- 
Jenkins studies, while others vaguely re- 
semble discrete syndromes like those found 
in the Wittenborn study. A cluster analysis 
of factor scores revealed five types of clin- 
ical cases, one of which (Egocentric Anti- 
social Aggressiveness) appears to match 
Hewitt and Jenkins’ Unsocialized Aggres- 
sive Child, and two others of which (Rela- 
tively Immature, Nonsociable, Semisurgent 
Egocentricity and Sociable Anxiety) appear 


to resemble Hewitt and Jenkins’ Overin- 
hibited Child. 

Despite the theoretical emphasis, since 
Freud, upon the continuity between prob- 
lems of childhood adjustment and the occur- 
rence of adult disorders, there has been re- 
markably little synthesis in the study of the 
forms of child and adult disorders. The 
American Psychiatric Association’s Diag- 
nostic and Statistical Manual of Mental 
Disorders (1952) is relatively undifferen- 
tiated in the area of childhood disorders, 
listing only the following categories: Ad- 
justment Reaction of Infancy; Adjustment 
Reaction of Childhood, with the subcate- 
gories of Habit Disturbance, Conduct Dis- 
turbance, and Neurotic Traits; Adjustment 
Reaction of Adolescence; and, Schizophrenic 
Reaction, Childhood Type. While the Ad- 
justment Reaction categories reflect the the- 
oretical conception that such disturbances 
are transient and peculiar to certain de- 
velopmental periods, the practical result is 
that many child clinics, for lack of more 
specific categories, find it necessary to as- 
sign large proportions of their functional 
cases to these categories, with little gain in 
utility through such assignments. In child 
psychiatric research and theory, distinctions 
like that between Adjustment Reaction with 
Conduct Disturbance and Adjustment Re- 
action with Neurotic Traits appear to be 
among the most frequently employed. Ross’ 
(1959) textbook, for example, makes a 
fundamental distinction between the child 
manifesting aggressive behavior and the 
child manifesting withdrawn behavior and 
physical symptoms. A more theoretical ver- 
sion of this distinction is implied in Bard, 
Sidwell, and Wittenbrook’s (1955) proposal 
that children be classified Healthy, Asocial, 
Psychoneurotic, or Antisocial, according to 
the kind and degree of standards they had 
introjected. 

Although the empirical isolation of clus- 
ters of attributes may be a necessary first 
step in some sciences, a more advanced step 
is to achieve a classification system based 
upon theoretical principles which are as- 
sumed to determine the observed groupings. 
The proposal of Bard, Sidwell, and Witten- 
brook that children be classified according 
to the degree and kind of standards they 
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had introjected illustrates classification dic- 
tated by a theoretical principle. Such a clas- 
sification could be regarded as an advance 
over purely empirical classification if it 
provides reliably discriminable categories 
which convey a theoretical understanding 
of the phenomena and which can eventually 
guide differential action on the part of those 
using the categories. The three symptom 
clusters found by Hewitt and Jenkins for 
children might be regarded as operational 
analogues of the distinction proposed by 
Bard, Sidwell, and Wittenbrook. Similarly, 
the “self-indulgence and turning against 
others,” and “self-deprivation and turning 
against the self” symptom clusters found by 
Phillips and Rabinovitch appear to reflect 
the same distinction between adult patients 
who have and adult patients who have not 
introjected social standards. The Excite- 
ment-Hostility and Guilt-Conflict factors 
found by Guertin would seem to indicate a 
similar distinction among schizophrenic pa- 
tients. It might thus be concluded that there 
is at least some similarity in the general 
symptom patterns of childhood and adult 
functional disorders, and that the distinc- 
tion between these patterns involves, ac- 
cording to one interpretation, the presence 
or absence of introjected social standards. 
The value of this distinction has been dem- 
onstrated for child disorders by Hewitt and 
Jenkins and by Bennett (1960). Both stud- 
ies found that the “overinhibited” or “neu- 
rotic” children tended to come from stable 
homes while the “aggressive” and “delin- 
quent” children tended to come from broken 
homes where the parents presented many 
social problems. The value of the same dis- 
tinction was demonstrated for adult dis- 
orders by Zigler and Phillips (1960). They 
found that symptoms from the “self-depri- 
vation and turning against the self” cluster 
were positively related to high standing on 
the social effectiveness variables of age, 
marital status, employment history, occupa- 
tion, education, and intelligence. Symptoms 
from the “self-indulgence and turning 
against others” cluster were inversely re- 
lated to standing on the social effectiveness 
variables. Furthermore, women were found 
to manifest more symptoms of “self-depri- 
vation and turning against the self," 


whereas men were found to manifest rela- 
tively more symptoms of “self-indulgence 
and turning against others.” Zigler and 
Phillips (1960) suggested “that both social 
effectiveness and patterns of symptomatic 
behavior represent the degree to which an 
individual tends either to resist or to con- 
form to the mores of society. [p. 237].” 

Beside acknowledging the parallel find- 
ings of the major distinction between clus- 
ters of “intropunitive” and “extrapunitive” 
symptoms for children and adults in the 
Hewitt-Jenkins, Phillips-Rabinovitch, and 
Guertin studies, it is difficult to reconcile 
specific inconsistencies among the empirical 
studies, to evaluate the degree of child-adult 
continuity beyond the one evident distinc- 
ton, and to evaluate the relation of the 
traditional psychiatric syndromes, like 
those found by Wittenborn, to the more 
general symptom clusters. Although not 
necessarily invalidating their findings, cer- 
tain features of the studies prevent con- 
clusive evaluations of the empirical purity 
of the clusters found. Any delineation of 
clusters of attributes must be influenced by 
the goals of the user and the methods em- 
ployed. In attempting either a theoretical 
synthesis or further empirical work, it is 
well to consider sources of inconsistency 
which cannot be simply written off as error, 
but which must be attributed to the multi- 
faceted nature of the classificatory enter- 
prise. 

First, it would appear that the formation 
of groupings found by Wittenborn may have 
been influenced by traditional psychiatric 
stereotypes. The psychiatrists constructing 
the rating scales as well as those making 
the symptom ratings were likely to have 
been trained to expect certain symptoms to 
oceur in the presence of certain other symp- 
toms and to be more alert to the occurrence 
of symptoms which coincided with the cate- 
gories to which they were accustomed. Al- 
though Wittenborn included psychiatrists of 
different theoretical persuasions, there may 
well have been sufficient overlap among 
them at the level of descriptive diagnosis to 
make this a significant source of bias. If this 
were the case, the discovery of factors re- 
sembling traditional diagnostic categories 
would not be surprising. Moreover, Witten- 
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born sought factor rotations which were 
“reminiscent of classical diagnostic con- 
cepts. [Wittenborn, 1957, p. 445]." 

Guertin obtained his symptom ratings 
largely through interviews with the patients. 
The type of probing and the symptoms thus 
sought and elicited might be expected sim- 
ilarly to reflect the biases of psychiatric 
practice. The Phillips-Rabinovitch findings 
may also have been subject to this source 
of bias because the reports of a physician 
or the examining psychiatrist provided the 
data. However, the type of statistical anal- 
ysis they employed was less likely than a 
factor analysis to produce syndromes as dis- 
crete as the traditional ones. A second reser- 
vation about the empirical purity of the 
Phillips-Rabinovitch clusters is that some 
symptoms were assigned rationally, rather 
than on the basis of observed statistical re- 
ationships. 

Dreger’s study suffers weaknesses which 
make it difficult to compare to the other 
studies. While its findings were probably not 
iased by psychiatrie stereotypes, the in- 
terrater agreement between parents or par- 
ent surrogates rating the same child ranged 
rom only 10% to 55%, with a mean of 36%. 
Moreover, nonpsychiatrie control subjects 
(Ss) were found to exhibit proportionally 
more sadistic behaviors than were reported 
or the clinic children, suggesting a possible 
bias in the ratings by the clinic children's 
parents. Dreger’s use of very specific “first 
order” behavior items rather than more 
general symptom categories may have in- 
creased objectivity somewhat, but at the 
same time probably resulted in a low fre- 
quency of responding in many categories 
and produced clusterings of items at a level 
too molecular to allow useful interpretation. 

Finally, the Hewitt-Jenkins study in- 
cluded several highly similar symptom cate- 
gories which may explain the emergence of 
the separate Socialized Delinquent and Un- 
socialized Aggressive clusters not found in 
other studies. For example, two of the six 
symptoms correlating most highly within 
the Unsocialized Aggressive cluster were 
“assaultive tendencies” and “initiatory 
fighting.” Among the seven symptoms cor- 
relating most highly within the Socialized 
Delinquent cluster were “bad companions,” 


“sang activities,” “cooperative stealing,” 
“furtive stealing,” “truancy from home,” 
and “out late nights.” It would appear that 
in many cases the same behavior could be 
assigned to each of two similar categories. 
Furthermore, as pointed out by Jenkins 
(1964), symptoms of developmental retar- 
dation, brain damage, and schizophrenia 
were not included in the symptom check list, 
so these syndromes were unlikely to appear 
in the statistical analysis. By using reports 
from all sources in the records, Hewitt and 
Jenkins may have minimized psychiatric 
biases at the observer level, but the forma- 
tion of trait clusters appeared explicitly to 
involve much “clinical” judgment, and the 
clusters found conformed without exception 
to the judges’ expectations. 

A general consideration regarding most of 
the studies is that they employed heteroge- 
neous subject populations and failed to 
make comparisons among homogeneous seg- 
ments of their heterogeneous samples. Some 
of the studies included organic, retarded, 
racially mixed, and physically handicapped 
patients, and all of them included irregular 
age and sex distributions. It is sometimes 
possible that such correlational analyses of 
heterogeneous samples will obscure opposite 
tendencies which may be present for differ- 
ent subgroups within the samples. Aside 
from the general inconsistencies among the 
studies, comparison of the different results 
is made difficult by the different meth- 
ods of analysis employed. The cluster anal- 
yses employed in the Phillips-Rabinovitch 
and Hewitt-Jenkins studies were likely to 
produce fewer and more general groupings 
than were the rotated factor methods em- 
ployed by Wittenborn and Dreger. Guer- 
tin’s use of a diagnostically homogeneous 
sample produced clusterings which would 
probably have been obscured in a similar 
factor analysis of a diagnostically hetero- 
geneous sample. 


PURPOSES OF THE PRESENT STUDY 


The primary purpose of the present study 
was to attempt to elucidate, in the child 
symptom domain, the relationship between 
the general symptom clusters found in the 
Hewitt-Jenkins, Phillips-Rabinovitch, and 
Guertin studies, and the specific functional 
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syndromes employed in adult psychiatry 
and found in the Wittenborn study. One 
obvious possibility is that the more specific 
syndromes are subtypes of the more general 
clusters. Lorr (1957) found evidence for this 
possibility when he subjected some of the 
Wittenborn data (Wittenborn & Holzberg, 
1951) to factor analysis by oblique and 
second-order methods. He found three sec- 
ond-order factors, the first of which repre- 
sented a “general turning against the self.” 
The second represented a “paranoid bellig- 
erence...defined by paranoid ideation, 
combativeness, and motor restlessness.” The 
third factor represented “thinking disorgan- 
ization joined with slowed psychomotor ac- 
tivity [Lorr, Klett, & MeNair, 1963, p. 83]." 
These are clearly reminiscent of the three 
clusters found by Phillips and Rabinovitch, 
and two of them are suggestive of the Over- 
inhibited versus two extrapunitive clusters 
found by Hewitt and Jenkins. A maximally 
flexible factor-analytie methodology was to 
be employed in the present study to test this 
same possibility for the child-symptom do- 
main. To establish conclusions with as much 
confidence as possible, the elimination or 
minimization of the limitations evident in 
earlier studies was also sought. 

A second purpose of the study was to ob- 
tain, for research purposes, a more differ- 
entiated empirical classification of child 
psychiatric cases than is now available. As 
pointed out by Zigler and Phillips (1961), 
the value of such a purely descriptive clas- 
sificatory schema would be much enhanced 
if useful class correlates beyond the defining 
characteristics of its classes could be dis- 
covered. 

A third purpose was to classify individual 
cases according to the factors obtained in 
order to see to what extent they represented 
types of cases. If it were indeed found that 
both specific syndromes and more general 
clusterings appeared in the factor analyses, 
the classification of cases according to their 
resemblance to both types of factors could 
reveal the degree to which some factors rep- 
resented subcategories of other, more gen- 
eral, factors. 

A fourth purpose of the study was to ex- 
amine readily available biographical data 
for relationships to the classifications de- 


rived from the factor analyses of symptoms. 
Statistically significant relationships be- 
tween classification by the factors and 
standing on the biographical variables 
would indicate that the symptom categories 
validly discriminated cases at least in terms 
of those particular biographical items. Fur- 
thermore, comparisons with earlier studies 
which have investigated such biographical 
differences could be made. 

The final purpose of the study was heu- 
ristic. The elucidation of the relationships 
between empirical groupings of symptoms 
at various levels of generality might sug- 
gest researchable hypotheses about the 
functional relationships determining those 
groupings. In conjunction with findings of 
significant differences on the biographical 
variables and comparisons with previous 
studies, intermediate level constructs, sub- 
ject to further test, might be evolved to ex- 
plain the symptom groupings obtained and 
to aid in the choice of diagnostic models for 
further research. 


PROCEDURE 
Data Collection 


The search for empirical groupings of attri- 
butes presents several problems for the collection 
of data. Decisions as to the sources of observa- 
tions, the level of abstraction at which observa- 
tions are to be classified for analysis, and the role 
of the observers are necessary. Collecting the 
attributes of disordered behavior requires a human 
observer who must subjectively abstract from his 
experience and categorize his observations. In 
addition, the human observer is quite likely to 
influence the behavior manifested by the human 
S observed. The present study attempted to mini- 
mize the influence of systematie biases in the 
observer by using the reports of several different 
observers, having different roles with respect to 
S, as contained in case histories. It, attempted to 
minimize systematie biases in the assignment of 
observations to categories for analysis by using 
raters who lacked specialized training in clinical 
practice and who would not be expected to share 
the clinical stereotypes as strongly as would trained 
practitioners. Finally, it attempted to reduce arti- 
factual clusterings of attributes by employing a 
mutually exclusive system of ratings, whereby any 
reported observation was to be entered in only 
one category and where the categories were de- 
fined so as to minimize overlap as much as possible. 

As for the level of abstraction of the rating 
categories, symptom categories were sought which 
were objective and required as little inference as 
possible, but which were not so specific as to pre- 
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clude meaningful abstraction due either to low 
frequencies or to overly molecular units. To 
attain this goal and to give some basis for com- 
parison with earlier findings, a symptom check list 
was constructed from items which regularly ap- 
peared in previous studies, which seemed to involve 
minimal inference, which could be considered 
mutually exclusive with regard to specific observa- 
tions, and which were not excessively molecular. 
In addition, 40 case histories at the University of 
Minnesota Hospital Child Psychiatry Unit were 
read to obtain further symptom categories. By 
this means and by adding a few new symptoms 
which occurred in the data samples, a list of 91 
symptoms (Appendix A), to be checked if pres- 
ent, was constructed. In effect, the definition of 
“symptom” here approximated that put forth 
by Lorr et al. (1963, p. 3). It was not intended 
necessarily to refer to signs or tokens of internal 
disease, but only to deviant behaviors, postures, 
attitudes, or verbalizations generally accepted as 
reasons for psychiatric concern, Of the biographical 
items initially sought, some were not adequately 
reported in a sufficient number of cases, and some 
were not reported in a form amenable to sta- 
tistical analysis. The following items were ulti- 
mately recorded for analysis: school performance ; 
IQ; premorbid social problems manifested by 8; 
parental social problems; persons with whom S 
was residing; parental attitude toward having S's 
problem treated; parental age when S was born; 
parental occupation and education; S’s age; num- 
ber of siblings; birth order; hometown population ; 
and religion. 

The data samples of 300 male and 300 female 
ense histories were obtained from the University 
of Minnesota Hospital Child Psychiatry Unit. As 
it is a teaching hospital, University Hospital’s 
records are generally more complete than might be 
the case in nonteaching institutions. Cases were 
obtained by working backward chronologically 
from 1964, Certain restrictions were employed in 
order to insure samples relatively homogeneous 
with respect to extraneous variables which might 
influence the clustering of functional symptoms. 
The following criteria were used to exclude cases 
from the samples: full scale IQ less than 75; 
good evidence for organic involvement; serious 
physical illness or severe chronic physical handicap 
(e. g., deaf, blind, crippled, spastic); the presence 
of less than three recordable symptoms in the 
record; age below 4 years or above 16 years at 
first psychiatrie contact; race other than Cau- 
casian; institutionalization for more than 2 years; 
residence in a foster home for more than 2 years 
unless the foster parents were close relatives of 
the biological parents or unless adopted by the 
foster parents soon after birth. In addition to these 
criteria for the exclusion of cases, there was some 
selection in order to obtain a relatively symmetri- 
cal social class distribution. This required going 
back 1 to 2 years earlier than would otherwise 
have been necessary in order to secure more upper 
social class cases. To obtain a median split on age 
which would be meaningful for certain subgroup 


analyses, equal numbers of Ss in the 4-10 and 11-15 
year age groups were sought. For the boys, this 
required no selective sampling since the median of 
the sample initially obtained was 10 years. The 
low frequency of young girls necessitated some 
selectivity to obtain more of them, and it was 
possible only to obtain a distribution with a median 
age of 11. Also, cases having minimal background 
information were excluded when they could be 
replaced by more complete cases which met the 
other criteria. Both inpatients and outpatients were 
included in the samples. There was no clear-cut 
distinction between inpatients and outpatients 
since inpatients were not hospitalized for long 
periods, and many outpatients eventually became 
inpatients and vice versa. 

The influence of drugs on the symptoms re- 
corded from the present samples was probably 
minimal. There were no reports of drugs being 
administered in most of the outpatient functional 
cases, although it is possible that some of these 
received tranquilizers through physicians outside 
the hospital. The use of parent and teacher reports 
in the data collection was likely to have insured 
that predrug symptoms were recorded. Cases re- 
ceiving drugs through hospital physicians usually 
presented some evidence for organic involvement 
and would have been excluded from the samples 
for that reason. A few other cases received drugs 
while they were inpatients, but this would not 
have influenced the symptoms recorded from the 
records, since symptoms occurring only during 
inpatient treatment were excluded. 

The raters were a female college graduate with 
no specialized training in psychology, and the 
author who, during the data collection period, was 
a first- and second-year graduate student in per- 
sonality research, Approximately 10 cases were 
initially rated by both raters and disagreements 
were discussed. Thereafter, each case was read by: 
only one of the raters, except for the 25 randomly 
selected cases upon which reliability coefficients 
were calculated. The following instructions were 
provided for the recording of symptoms: 


Insofar as possible, these (symptom items) 
refer to behavior and personal reports which 
require little or no inference. The intake inter- 
view, interviews with the parents, letters of re- 
ferral from doctors, schools, courts, and welfare 
agencies, and the case summary are the best 
sources. Those symptoms which have been mani- 
fested at some time during the last three years 
are to be checked if they appear to be in some 
way a part of the reason for which the child is 
being referred. For a child of four, enuresis 
which ceased at age two and one-half and has 
not been a problem since then would not be 
included, for example. 

Caution: 1. Do not check items which appear 
jn the record merely as inferences from psycho- 
logical instruments or by the interviewer; e. g., 
at some point in the evaluation of almost every 
child, the inference will be made that the child 
is fearful, depressed or the like, but do not 


check these items unless, (a) the child reports 
that he is experiencing these feelings; or (b) 
there is repeated mention in the record that 
different people have observed these symptoms; 
or (c) there is clear behavioral evidence for 
these symptoms reported by parents, teachers, 
or others who have observed the child outside 
of an interview setting. 

2. Do not include items which occur only 
during the course of inpatient treatment but 
which were never present before the child was 
admitted. Do not include common items, e. g. 
headaches, which are referred to only in the 
course of the physical exam. Include such items 
only when they are of abnormal proportions or 
are also mentioned outside of the physical ex- 
amination, The initial and final sets of therapy 
and supervisory staff notes should be read. 

3. Do not check more than one symptom on 
the list for any given item of behavior. For 
example, if S reports having headaches, just the 
item “headaches” should be checked, while 
“pains, physical complaints” should not be 
checked unless there is mention of other physi- 
cal complaints which are not covered specifically 
by another item like “stomachaches.” Likewise, 
if the S has a strong fear of some specific thing, 
e. g. dog phobia, the item “phobias, fears” 
should be checked, but “fearful, anxious” should 
not be checked unless it is stated that the child 
is also fearful or anxious in a general non-specific 
way. If physical causes are found for a symptom, 
do not include the symptom, e. g., if it is found 
that blurred vision is being caused by poor eyes. 

Each item on the symptom check list is to be 
regarded as the description of a class of behavior 
not entirely normal in degree. If behavior fitting 
one of these class descriptions is noted in the 
record, that class should be checked, unless the 
behavior is of apparently normal degree. For 
example, "fighting" should not be checked for a 
single mention of "fights with brother," but 
should be checked if it is frequently mentioned, 
if it appears to be of abnormal degree, or if it 
is one of the reasons for which the child was 
brought to U. M. H.; “erying” should not be 
checked unless the child cries very easily or is 
subject to unusual erying spells. 

While the use of data from several sources of 
observations in a record, extracted by raters as- 
sumed not to share classical clinical stereotypes 
and using the methods described above, by no 
means precludes all sources of bias, the biases re- 
maining are unlikely to be as systematic as the 
ones possibly influencing the groupings found in 
previous studies. It must, however, be acknowl- 
edged that the attempts at improvement on previ- 
ous studies fall far short of ideal solutions: larger 
subject populations would have made still more 
homogeneous groupings possible, case histories 
were not always uniform in completeness and 
clarity, the symptoms sought may not have been 
properly representative of the symptom domain, 
and the implementation of the definition of 
“symptom” may have been subject to unknown 
influences. 
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The degree of agreement between the two 
raters on the 25-case reliability check sample can 
be calculated in several different ways. The most 
straightforward method is to calculate the ratio 
of the number of symptoms on which there was 
agreement to the total number of symptoms rated. 
If the number of zero entry symptoms is in- 
cluded in this ratio, that is, the symptoms which 
were agreed by both raters to be absent, the 
average percentage of agreement for 25 cases was 
96.5 and the median was 96.7, both of which are 
probably unrealistically high since there was a 
large proportion of zero entries. If zero entries 
are not included in the ratio, the average percent- 
age of agreement was 65.5 and the median was 
71.4, which may be unrealistically low since zero 
entries did not necessarily indicate that no judg- 
ments were involved. Judgments would have been 
involved in zero entries where both raters decided 
that a certain reported observation did not deviate 
sufficiently from normal to warrant checking the 
symptom category representing it. Another re- 
liability formula occasionally used in such situa- 
tions (e. g., Chittenden, 1942; Marshall & Mc- 
Candless, 1957) is: 2 X sum agreements/X checked 
by A + X checked by B. Omitting zero entries, 
this formula yielded an average of 79.176 agree- 
ment. 


Factor Analyses 


The data obtained from the case histories were 
coded and punched on IBM cards. The data for 
each sex group were analyzed separately through- 
out. Symptoms were coded 0 if not checked and 
1 if checked. All symptoms which occurred five 
or more times in a sex group were retained in the 
analyses for that sex group. Symptoms were inter- 
correlated using the product-moment correlation 
routine of the University of Minnesota Statistical 
Library Program 55 (UMSTAT 55) for orthogonal 
factor analysis. The symptom intercorrelations are 
thus represented by phi coefficients. UMSTAT 
55 was then used to obtain a principal-factor solu- 
tion for the correlation matrix. Ones were used in 
the principal diagonals in place of reliability co- 
efficients, and an eigenvalue of 1.000 (Kaiser’s 
criterion) was set as the minimum below which 
no factors would be calculated. The limit on the 
orthogonal rotation angle was set at .009 radians. 
The entire principal-factor matrix was rotated to 
the quartimax and varimax approximations to 
simple structure, using UMSTAT 55. Finally, the 
first 15 principal factors, that is, those having 
the 15 largest eigenvalues, were rotated using 
Carroll’s oblimin method, adapted to the 1604 
computer by Conrad Katzenmeyer. The gamma 
value for the oblimin rotation was set at .5. The 
fact that the oblimin factors can be intercorrelated 
means that the addition of each new principal 
factor can cause marked changes in the rotated 
factors obtained before the addition of that princi- 
pal factor. Because the program printed out the 


solution each time another principal factor was ' 


added, 14 oblimin solutions were obtained. 
Although simple structure is the predominant 
rotational standard for factor analysis, its cri- 
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teria are not mathematically precise. The use of 
two different orthogonal rotations and 14 oblimin 
rotations was intended to aid in the identification 
of especially reliable factors. It was expected that 
a factor which appeared in each of the various 
rotations would be a relatively reliable one since 
changes in rotational criteria did not obscure it. 

Another objective of the factor-analytic ap- 
proach employed was to seek out hierarchical 
orderings of groupings which might be present in 
the data. The most obvious means to this end was 
through the second-order analysis of the oblique 
factors obtained in the oblimin rotations. 


REsuLTS 
MALES 


The Sample 


The distribution of the samples by age, 
parental social class, and diagnosis can be 
obtained from the tables in Appendix B. The 
chronological span of admission dates cov- 
ered in this sample was 1953 through 1964. 
As a result of the effort to exelude individ- 
uals who had been adopted later than early 
infancy, only five adopted cases appeared 
in the sample, the oldest age at adoption 
being 3/5 months. The exclusion of individ- 
uals institutionalized over long periods re- 
sulted in the inclusion of only four cases 
which could in any sense be considered in- 
stitutionalized prior to psychiatric referral. 
Two of these cases had been legally com- 
mitted to state-aided foster homes because 
of their difficult behavior, one was in reform 
school, and one in a detention center. In 
none of these cases did the period of “insti- 
tutionalization” exceed 5 months. Thus the 
vast majority of the boys were living at 
home with at least one biological parent or 
with close relatives. 

After excluding all symptoms which oc- 
curred less than five times in the sample of 
300 cases, 74 symptoms remained for analy- 
sis. These are indicated by the letter M on 
the symptom check list in Appendix A. 
Counting only those 74 symptoms retained 
for analysis, the mean number of symptoms 
recorded per case was 8.28. 


Factor Analyses 


_In this section, the factors presented are 
given discriptive names for convenience of 
reference. These names were selected 


‘In a previous brief summary of this research 


through consultation with one adult and 
three child clinical psychologists. The names 
are not intended to convey theoretical or 
interpretive implications but are merely 
attempts at shorthand descriptions which 
will be meaningful to psychologists. In some 
instances it was difficult to get consensus on 
a descriptive label, and several qualifica- 
tions accompany the label finally selected. 
All minus signs are dropped from the factor 
loadings presented. The items are listed as 
shortened versions of those used in data 
collection, but the full symptom category, as 
listed in Appendix A, is always implied. For 
example, “suicidal” on a factor represents 
Item 45, “masochism, self-harm, suicidal, 
threatens to kill self.” 

Principal factors. Five principal-factor 
analyses were performed on the symptom 
data of the male sample. First, the symp- 
toms from the entire group of 300 were 
intercorrelated and analyzed. Second, a 
median split on age produced two groups 
of 150 Ss each, ranging in age from 4-10 
and 11-15, respectively. In each of these 
groups, symptoms occurring four or more 
times (62 for the younger males and 67 for 
the older males) were intercorrelated and 
analyzed. Third, a median split on social 
class produced a group of 144 Ss in the 
lower three classes and 155 Ss in the upper 
three classes, with one S being excluded due 
to lack of social class data. Symptoms oc- 
curring four or more times (65 in the lower 
class group and 66 in the upper class group) 
were intercorrelated and analyzed sepa- 
rately for each group. Age and social class 
were evidentally independent in these 
dichotomies since approximately equal 
numbers of lower and upper class Ss fell 
into the younger and older groups. The 
median splits were rather coarse break- 
downs, but meaningful factor analyses 
would have been precluded by the small Ns 
resulting from finer breakdowns. 


(Achenbach, 1965), some of the factors were given 
labels slightly different from those reported here. 
In addition, a few small rotated factors which 
classified very few Ss, the second principal factors 
for both sexes, and two rotated factors for the girls 
which did classify a significant number of Ss had 
not been fully investigated by the time of that 
report. Except for the addition of these factors, 
the overall pattern of classification of Ss there did 
not differ from the one presented here. 
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TABLE 1 
Mate Principat Factors 


(a) First principal factor 


Internalizing (positive 
end of first principal 
factor) 

.526 Phobias 

424 Stomachaches 

.382 Fearful 

.363 Pains 

.344 Worrying 

.343 Withdrawn 

.339 Nausea 

.335 Obsessions 

.330 Shy 

.329 Vomiting 

.304 Compulsions 

.302 Insomnia 

.266 Crying 

.263 Fantastic thinking 

.259 Headaches 

.256 Seclusive 

.249 Apathy 

.231 Depression 

.227 Nightmares 

.225 Nervous 

.220 Refusing to eat 

185 Overtired 

.182 Fears own impulses 

.154 Confused 

.153 Self-conscious 

451 Obese 

.135 Ties 

.120 Bizarre behavior 

A17 Stuttering 

107 Skin eruptions 

101 Asthma 


Externalizing (negative 
end of first principal 
factor) 

.632 Disobedient 

.555 Stealing 

510 Lying 

492 Fighting 

453 Cruelty 

445 Destructive 

.399 Inadequate 

feelings 

-398 Vandalism 

.972 Truancy 

.362 Fire-setting 

.342 Swearing 

.317 Running away 

-277 Temper tantrums 

.275 Showing off 

.274 Hyperactive 

.227 Sexual delinquency 

.201 Threatening people 

.149 Negativistic 

.144 Poor school work 

.133 Sexual perversions 

.121 Attention demand- 

ing 

-108 Enuresis 

.103 Encopresis 


guilt 


(b) Second principal factor 


Severe and Diffuse Psychopathology (unipolar) 


531 Bizarre behavior 
491 Fantastic thinking 
.460 Temper tantrums 
418 Threatening people 
.379 Ideas of reference 
.372 Insomnia 

359 Loudness 

.326 Nightmares 

.320 Crying 

.268 Disobedient 

208 Nausea 

.267 Destructive 

260 Fighting 

.257 Cruelty 


.253 Swearing 

.246 Phobias 

.246 Attention demand- 
ing 

-229 Moodiness 

.226 Can’t concentrate 

.222 Vomiting 

.215 Quarrelsome 

215 Withdrawn 

.212 Headaches 

.212 Daydreaming 

-208 Poor motor coordi- 
nation 

-207 Obsessions 

207 Refusing to eat 


The number of principal factors obtained 
in the five analyses ranged from 23 to 28. In 
each analysis, the first principal factor was 
bipolar, and its eigenvalue was substantially: 
larger than those of the second and succeed- 


ing factors. The pattern of symptom load- 
ings on the first principal factor was also 
very similar throughout the five analyses. 
Congruence coefficients, calculated by the 
product-moment correlation formula (Har- 
mon, 1960), between the first principal fac- 
tor from the analysis of the entire sample 
and the first principal factors from each of 
the four subgroups ranged from .956 to .985, 
with a mean of .968. This indicated a highly 
reliable dimension which was similar in each 
of the relatively homogeneous subgroups 
and of which the first principal factor for 
the entire group was a very good representa- 
tive. Even relatively low factor loadings ap- 
peared to reflect the polar tendencies of this 
principal factor, and items with loadings as 
small as +.100 are presented in Table 1. 
The label Internalizing versus Externalizing 
was selected for this factor. The label is not 
intended to carry dynamic implications. It 
means only that the symptoms at the Ex- 
ternalizing end describe conflict with the 
environment, while those at the other end 
describe problems within the self. 
Although having a substantially smaller 
eigenvalue than the first principal factor 
(3.080 versus 5.269 in the analysis of the 
entire sample), the second principal factor 
appeared relatively consistent throughout 
the five analyses, It was unipolar, and con- 
gruence coefficients between the second 
principal factor from the analysis of the 
entire sample and those from the four sub- 
groups ranged from .722 to .911, with a 
mean of .790. Finding a descriptive label 
which commanded consensus was much 
more difficult for this factor than for the 
first one. Suggested labels included “severe 
psychopathology,” "extreme disorganiza- 
tion,” and “diffuse psychopathological dis- 
organization.” Descriptively, it seemed to 
include symptoms both of severe mental 
disturbance and of excessive belligerence. 
The label Severe and Diffuse Psychopathol- 
ogy was finally chosen. Items with loadings 
down to .200 are presented in Table 1. 
Rotated factors. Because the largest two 
factors were so similar in the five analyses, 
and the succeeding factors had relatively 
low eigenvalues (2.684 for the third factor, 
2.282 for the fourth in the analysis of the 
entire sample), the primary dimensions in 
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the principal-factor matrix from the entire 
sample were assumed to be representative 
of the various subgroups, and this matrix 
was rotated to obtain the different simple 
structure solutions. All 28 principal factors 
were rotated to the varimax and quartimax 
criteria, and the 15 principal factors having 
the largest eigenvalues were rotated to the 
oblimin criterion. The resulting approxima- 
tions to simple structure were examined for 
factors which were identifiable in all or 
most of the solutions. Any pattern of symp- 
tom loadings which appeared consistently 
on a factor throughout the rotated solutions 
was concluded to represent a relatively re- 
liable factor. All variations of such a pat- 
tern of symptom loadings were then con- 
sidered together and ranked according to 
how representative of the group they 
seemed to be. No items with loadings below 
.100 were considered in the selection pro- 
cedure. A subjective combination of the fol- 
lowing criteria was used in choosing the best 
representative for each factor: (a) inclu- 
sion of the most symptoms which frequently 
appeared on the other variations of the fac- 
tor; (b) least overlap of symptoms with 
other reliable factors; (c) highest factor 
loadings. In addition, for each factor, a cut- 
off point was selected for loadings below 
which items would not be considered be- 
cause they began to overlap with another 
factor. The following example may help to 
illustrate this procedure: it was noticed 
that a factor on which “vomiting” and 
“nausea” had the highest loadings, and 
which usually included “stomachaches,” 
“headaches,” “pains,” and “phobias,” ap- 
peared in 15 of the 16 rotated solutions 
(varimax, quartimax, and 13 of the 14 
oblimin solutions). Each of the 15 varia- 
tions of this factor, including all symptoms 
with loadings of .100 and above, was typed 
on a separate card. Then the symptoms with 
the lowest loadings on each variation of the 
factor were examined to see if there was à 
point where groups of symptoms consist- 
ently having high loadings on another group 
of rotated factors appeared together. It was 
found that “fantastic thinking,” “confused,” 
and “withdrawn,” which appeared together 
with relatively high loadings on another 
group of rotated factors, appeared together 


with low loadings on most of the variations 
of the “vomiting-nausea” factor. Therefore, 
“fantastic thinking” and items with lower 
loadings than “fantastic thinking” were 
dropped from further consideration in re- 
lation to this factor. The remaining symp- 
toms on each of the 15 variations were then 
used to rank the 15 according to the three 
criteria stated above. The third factor in 
the three-factor oblimin solution was chosen 
by this method as the best representative of 
the group of factors heavily loaded on 
“vomiting and nausea.” The four clinical 
psychologists who were later consulted for 
labeling the factors took into consideration 
the five variations which had been ranked 
as the five best representatives of each 
factor. 

By the foregoing procedure eight rotated 
factors were found. These are presented in 
Table 2, and the rotated solution in which 
it appeared is indicated for each one. Com- 
plete consensus was attained in labeling 
four of them: 1. Somatic Complaints; 2. 
Delinquent Behavior; 3. Obsessions, Com- 
pulsions, and Phobias; and, 4. Sexual Prob- 
lems. Factor 5 was generally agreed to in- 
clude schizoid thinking and behavior, but 
some reservation was voiced due to the pos- 
sible implication that “schizoid” might 
imply relatively mild pathology while the 
symptoms on the factor included some that 
might be severe enough to be called schizo- 
phrenic. Schizoid Thinking and Behavior 
was selected with the qualification that 
nothing about intensity of behavioral de- 
viation is implied. Neither is any contrast 
with “schizophrenic” nor any implication 
about etiology intended. “Schizoid” would 
simply appear to be a more general term 
than “schizophrenic.” 

Factors 6 and 7 presented greater labeling 
difficulties. Factor 6 elicited suggestions of 
“ynsocialized aggressive behavior," “anti- 
social aggression,” and "aggressive." The 
common element in these suggestions was 
aggression, so the label Aggressive Behav- 
jor was chosen, with the qualification that 
some of the behavior included was not di- 
rect aggression against other persons. 

Factor 7 elicited suggestions of “uncon- 
trolled impulsivity and hyperactivity,” 
“hyperactive agitation,” “tenseness, hyper- 
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TABLE 2 
Mars Rorarep FACTORS 

1. Somatie Complaints (ob 3-3) 217 Swearing 
.534 Vomiting 174 Lying 
525 Nausea .173 Enuresis 
.517 Stomachaches 165 Withdrawn 
.501 Headaches -163 Sexual perversions 
.418 Pains 5. Schizoid Thinking and Behavior (quartimax 2) 
.374 Phobias .745 Fantastic thinking 
.371' Depression .577 Bizarre behavior 
.299 Suicidal 461 Insomnia 
.297 Overtired 428 Loudness 
.295 Ideas of reference .385 Crying 
.283 Dizziness .316 Confused 
.276 Insomnia .267 Phobias 
.263 Withdrawn .231 Poor motor coordination 
.256 Diploplia .215 Withdrawn 
.207 Worrying .203 Refusing to eat 
.191 Nightmares .202 Ideas of reference 
.183 Shy .198 Overtired 
.164 Obese .189 Quarrelsome 
.162 Complains no one loves him .187 Stuttering 
.156 Fearful .186 Nightmares 

2. Delinquent Behavior (varimax 1) .177 Obsessions 
719 Truancy .172 Temper tantrums 
697 Running away .170 Headaches 
.509, Vandalism j .146 Daydreaming 
493 Lying .134 Fearful 
479 Inadequate guilt feelings 114 Can't concentrate 
476 Disobedient 6. Aggressive Behavior (quartimax 14) 
827 Fire-setting 813 Cruelty 
243 Fighting 554 Threatening people 
.238 Cruelty .546 Destructive 
196 Swearing .402 Inadequate guilt feelings 
.179 Sexual delinquency .373 Vandalism 
.162 Destructive .300 Disobedient 
.132 Poor school work .215 Suicidal 
.102 Showing off. .194 Lying 

3. Obsessions, Compulsions, and Phobias (oblimin .183 Ideas of reference 
4-2) .157 Temper tantrums 
.647 Obsessions .139 Fire-setting 
.545 Compulsions 134 Stealing 
4465 Fears own impulses 124 Truancy 
-448 Fearful 424 Fighting 
390. Ties 3 7. Hyperreactive Behavior (oblimin 3-3, negative 
:350 Stuttering end) 
329 Phobias 494 Hyperactive 
318 Nervous .360 Can't, concentrate 
309 Seclusive — .282 Attention demanding 
2008) Fantastic. thinking .216 Loudness 
.295 Self-conscious 209 Quarrelsome 
.294 Daydreaming 182 Ti 
.287 Insomnia 159 ahi I 
.258 Worrying p pee 
243 Withdrawn -158 Encopresis 
.217 Nightmares 129 Fighting 

4. Sexual Problems (oblimin 15-4) -115 Poor school work 
.607 Masturbation -102 Temper tantrums 
.500 Sexual preoccupation 8. (Unnamed) (oblimin 11-11) 
.451 Sexual delinquency -595 Constipation 
.353 Overtired 4463 Nailbiting 


.297 Thumbsucking -449 Encopresis 
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TABLE 2—Continued 


401 Dizziness 
.352 Destructive 
.340 Enuresis 
.194 Fire-setting 
.173 Diploplia 
.173 Fighting 


.164 Poor motor control 
160 Shy 

.153 Skin eruptions 

.147 Stomachaches 

.128 Insomnia 

.126 Overtired 


Note.—The source of each factor is indicated in parentheses: e.g., (oblimin 4-2) means the factor 
was the second factor in the four factor oblimin solution; (varimax 1) means it was the first factor 


in the varimax solution. 


activity, and irritability,” and “hyperac- 
tive-hypersensitive.” Hyperactivity was the 
obvious common element in all these sug- 
gestions, and a common reference to over- 
reactivity to stimulation was also made. 
With reservations, the term Hyperreactive 
Behavior was finally chosen as being gen- 
eral enough to include both the items 
suggestive of hyperactivity and those sug- 
gestive of impulsivity-agitation-irritability- 
hypersensitivity. Factor 8 was found to 
classify, by means of the procedure to be 
described below, only two Ss out of the 300, 
so it was left unnamed and was not retained 
or further statistical analyses. 
Second-order factors. The goal of obtain- 
ing higher order groupings of symptoms was 
to be attained by means of orthogonal fac- 
tor analyses of the intercorrelations be- 
ween first-order oblimin factors. An initial 
difficulty, however, was that a choice of 
oblimin solution had to be made since there 
were 14 solutions available. Because not all 
he oblimin factors chosen as being the best 
representatives of their respective groups 
came from the same solution, no single solu- 
tion could be chosen by that particular cri- 
erion. It was decided, therefore, to do sec- 
ond-order analyses of two oblimin solutions 
which were not highly similar but which 
contained good examples of most of the re- 
liable factors. The four- and eight-factor 
oblimin solutions were chosen, and the fac- 
tor intercorrelations were analyzed by the 
principal-factor method with quartimax and 
varimax rotations. The principal-factor so- 
lution was similar to the rotated solutions 
in both of these analyses. In the second- 
order analysis of the four-factor oblimin 
solution, two factors appeared. In all three 
second-order orthogonal solutions, the first 
factor was bipolar, with high loadings on 


the Somatic Complaints and Obsessions, 
Compulsions, and Phobias factors at one 
end and a high loading on the Delinquent 
Behavior factor at the other end. The sec- 
ond factor in each solution was unipolar, 
with the highest loading on the Schizoid 
Thinking and Behavior factor. The three 
second-order solutions for the eight oblimin 
factors were also very similar. They pro- 
duced three factors, the first of which was 
bipolar with high loadings on the Somatic 
Complaints and Obsessions, Compulsions, 
and Phobias factors at one end and a high 
loading on Sexual Problems at the other 
end. The third factor tended to be unipolar 
although the smaller pole had some rela- 
tively large loadings. Hyperreactive Behav- 
ior had the heaviest loading on this factor. 

To obtain loadings for specific symptoms 
on the second-order factors, the obvious 
procedure is to multiply the loadings of the 
symptoms of a first-order factor by the 
loading that factor has on a second-order 
factor. Drawing conclusions from the re- 
sults of this procedure is difficult because 
any symptoms which have loadings on more 
than one first-order factor may be given 
several different second-order loadings. The 
situation is further complicated when some 
of these second-order loadings are negative 
and some positive, which may occur if a 
symptom appears at the positive end of one 
first-order factor having a positive loading 
on a second-order factor and the same 
symptom appears at the negative end of 
another first-order factor also having posi- 
tive loadings on the same second-order fac- 
tor. A similar situation may arise if a symp- 
tom has positive loadings on two first-order 
factors, one of which is loaded positively 
and the other negatively on a given second- 
order factor. Empirically, these possibili- 
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ties may be somewhat remote among symp- 
toms with high first-order loadings, but they 
did occasionally arise for symptoms with 
smaller loadings. To simplify the results 
somewhat, only positive first-order loadings 
were multiplied by the second-order load- 
ings since the oblimin factors generally 
tended toward unipolarity with their heav- 
iest loadings on the positive end. Also, only 
first-order factors with second-order load- 
ings of .300 or above were included in the 
calculations. 

There was an obvious similarity between 
the bipolar groupings on the first-order first 
principal factor (Table 1) and the bipolar 
groupings on the largest second-order fac- 
tor for both the four- and eight-factor ob- 
limin solutions. There is good reason for ex- 
pecting this similarity between the first 
principal factor and the first unrotated sec- 
ond-order factor. Rimoldi (1951; ef. Fruch- 
ter, 1954, p. 172) subjected the same inter- 
correlation matrix of reasoning tests both 
to multiple-group and to Spearman type 
two-factor analyses. The intercorrelations 
of the factors obtained by the multiple- 
group method were analyzed to three cen- 
troid factors. The loadings of the tests on 
the first unrotated second-order factor were 
found to be approximately proportional to 
their loadings on the g factor produced by 
the two-factor method. Since the first prin- 
cipal factor in a principal-factor solution is 
similar to a Spearman g factor (Burt, 1938; 
cf. Harmon, 1960, p. 163), and since the 
multiple-group method applied to variables 
with well-known properties, such as the 
tests used by Rimoldi, can resemble oblique 
rotations to simple structure, it is not sur- 
prising that the first principal factor and 
first unrotated second-order factor found in 
the present study were similar in their pat- 
terns of item loadings. It would seem intui- 
tively probable that a g type first-order 
factor should be similar to the largest sec- 
ond-order factor if they both accurately re- 
flect the most central dimension in the cor- 
relation matrix. In any event, for practical 
purposes, the first principal factor appear- 
ing in the present data can probably be 
regarded as representing the primary di- 
mension in the data better than do any of 
the possible second-order factors for the 


following reasons: (a) as a least squares 
solution, the first principal factor is mathe- 
matically unique whereas the oblimin solu- 
tion on which second-order factors are based 
is a mathematical approximation to the 
criteria for simple structure which them- 
selves do not determine a mathematically 
unique solution; (b) there are many oblimin 
solutions from which the second-order fac- 
tors can be obtained and there are no pre- 
cise criteria for selecting one as the most 
appropriate; (c) multiplying the first-order 
item loadings by the second-order factor 
loadings yields more than one value for 
some items and may yield positive and 
negative second-order loadings for the same 
item; and, (d) the first principal factor 
here did not suffer from weaknesses usually 
attributed to unrotated factors as com- 
pared to rotated ones—specifically, lack of 
meaningfulness and lack of stability when 
items are dropped; the first principal factor 
reflected a dichotomy which was descrip- 
tively consistent and psychologically mean- 
ingful without rotation, and, as evidenced 
by the high congruence coefficients reported 
above, it was barely altered by dropping 
as many as 12 items and by analyzing dif- 
ferent subgroups of Ss. 

Classification of Ss by the factors. To de- 
termine what meaning the factors had for 
the classification of individual cases and 
what the relationship was between the 
general Internalizing-Externalizing dimen- 
sion and the rotated factors, cases were 
classified according to their standing on the 
various factors reported in Tables 1 and 2. 
The classification of cases solely on the 
basis of factor scores was deemed inap- 
propriate since each case had many zero 
entry items (symptoms not reported as 
present). This would have caused factor 
scores to have been badly confounded with 
the number of symptoms reported, which in 
turn may occasionally have been a function 
of the completeness of the case record. 
Furthermore, since factors were taken from 
different solutions, their loadings were not 
necessarily comparable on a single absolute 
scale. 

To avoid these two sources of error, it 
was decided to classify Ss according to an 
arbitrarily selected degree of resemblance 
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to a factor, in terms of the percentage of 
their symptoms matching that factor. First, 
if 60% or more of an S's symptoms came 
from the internalizing end of the first, prin- 
cipal factor, he was assigned to the In- 
ternalizing category. Likewise, if 6096 or 
more of his symptoms came from the ex- 
ternalizing end, he was assigned to the Ex- 
ternalizing category. By this criterion, 68 
cases were assigned to the Internalizing 
category, 128 cases were assigned to the 
Externalizing category, and 104 cases were 
left unclassified. The large number of un- 
classified cases is not simply a result of 
almost equal proportions of Internalizing 
and Externalizing symptoms occurring in 
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all these cases. Only the 54 symptoms with 
loadings of +.100 were used in classifying 
cases, but the 20 symptoms having loadings 
below =.100 also contributed to the denomi- 
nator of the proportion calculated for each 
S. Cases whose symptoms included some of 
these 20 might therefore fail to meet the 
60% criterion, even though the numbers of 
Internalizing and Externalizing symptoms 
were far from equal. Several other means of 
classification could also have been at- 
tempted. For example, a continuous scoring 
based on the precise percentage of symptoms 
from each group could have been employed. 
The precision of the data was not thought 
to warrant such scoring however. Future ap- 
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Fig. 1. Classification of male Ss by first principal and rotated factors. (One “obsessive” and two 


“somatic”? Ss came from the Unclassified group.) 
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First Principal Factor 


Fic. 2. Classification of male Ss by first and 
second principal factors. 


plications of the present factors to live pa- 
tients should seek quantitative refinements. 

The second step was to classify cases by 
their resemblance to the rotated factors. 
Again, if 60% or more of an S’s symptoms 
appeared on a rotated factor, he was as- 
signed to the category represented by that 
factor. However, unlike the mutually exclu- 
sive polar groupings of symptoms on the 
principal factor, the groupings of symptoms 
on the rotated factors overlapped, raising 
the possibility of tied scores. If an S had the 
same percentage of symptoms (at or above 
the 60% criterion) on two or more rotated 
factors, factor scores were calculated, and 
the S was assigned the category for which 
his factor score was highest. Figure 1 pre- 
sents the number of Ss classified by the 
principal factor, by each of the rotated fac- 
tors, and the relationship between the two 
levels of classification. 

Since the second principal factor also 
showed some consistency in the several sub- 
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group analyses and gave some evidence of 
appearing in the second-order analyses, it 
too was used to classify cases by the 60% 
criterion. Because it was a unipolar factor 
and generally showed less stability than the 
first principal factor, only symptoms with 
loadings of .200 and above, as presented in 
Table 1, were used for the classification pro- 
cedure. Sixty-five cases were found to meet 
the 60% criterion, while the other 235 did 
not. A comparison of the classification of 
cases by the first principal factor and by the 
second principal factor (Figure 2) shows 
that they were clearly orthogonal to one an- 
other. The cases falling into the category 
of Severe and Diffuse Psychopathology were 
distributed in the Internalizing, Externaliz- 
ing, and Unclassified categories of the first 
principal factor roughly proportionally to 
the total number of cases in those cate- 
gories. 


FEMALES 


The Sample 


The chronological span of admission dates 
was 1951 through 1964. Nine of the Ss had 
been adopted, the oldest age at adoption 
being 7 months. Only two of the cases could 
be considered institutionalized prior to re- 
ferral. Both of these had been living in state- 
supported boarding homes, one for 2 months 
and one for 1 year. Seventy-three symptoms 
occurred five or more times, and these are 
indicated by the letter F on the symptom 
check list in Appendix A. Counting only 
these 73 symptoms, the mean number of 
symptoms recorded per case was 7.69. 


Factor Analyses 


Exactly the same factor analytic proce- 
dures as with the males were followed. The 
median split on age resulted in two groups 
of 150 Ss each, ranging in age from 4-11 and 
12-15. In the younger group 60 symptoms 
occurred four or more times, and in the 
older group 65 symptoms occurred four or 
more times. The median split on social class 
produced a group of 160 Ss in the lower 
three classes and 130 in the upper three 
classes, with social class data being un- 
available for 10 Ss. Sixty-six symptoms in 
the lower class group and 64 symptoms in 
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the upper class group occurred with a fre- 
quency of four or greater. Approximately 
equal numbers of lower class and of upper 
class Ss appeared in each age group, indi- 
cating that the social class and age dichoto- 
mies were independent. 

Principal factors. The number of princi- 
pal factors obtained in the five analyses 
ranged from 24 to 28. As with the males, the 
eigenvalue of the first principal factor was 
substantially larger than that of the second 
factor, the pattern of symptom loadings was 
similar throughout the five analyses, and 
the bipolar distribution of symptoms repre- 
sented a dichotomy similar to that found 
for the males. Congruence coefficients be- 
tween the first principal factor from the 
analysis of the entire sample and the first 
principal factor from each of the four sub- 
groups ranged from .848 to .967, with a 
mean of .922. The factor has been given the 
same name as that for the males and is pre- 
sented in Table 3. 

The second principal factor was less con- 
sistent than in the male analyses. Like the 
males’ factor, it tended to be unipolar and 
difficult to label descriptively. It appeared 
to include many of the same extremely 
pathological items as the males’ factor, but 
without the more aggressive behaviors. Se- 
vere and Diffuse Psychopathology seemed 
to apply as well to this factor as to the 
males’ factor, although it was clear that 
this factor contained more symptoms found 
at the Internalizing pole of the first princi- 
pal factor than did the male factor, which 
included more symptoms from the Exter- 
nalizing pole of the male first principal fac- 
tor. The congruence coefficients between the 
second principal factor from the entire sam- 
ple and those from the subgroups were rela- 
tively low, ranging from .110 to .538, with 
a mean of .420. Three of these coefficients 
were in the .50's however. The factor which 
had a congruence coefficient of only .110 ap- 
peared in the analysis of the upper social 
class female data. It tended to be bipolar 
with symptoms similar to those heavily 
loaded on the other second principal factors 
at one end and a few symptoms like “nau- 
sea,” “stomachaches,” and “inappropriately 
indifferent” heavily loaded at the other end. 
Despite the low congruence coefficient, 14 
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TABLE 3 
FEMALE PRINCIPAL FACTORS 


(a) First principal factor 
Internalizing (negative Externalizing (positive 


end of first principal 
factor) 

.521 Nausea 

.494 Pains 

483 Headaches 

.448 Stomachaches 

426 Phobias 

.346 Vomiting 

.324 Diplopia 

.295 Refusing to eat 

.290 Obsessions 

.285 Fearful 

.281 Withdrawn 

.274 Depression 

-264 Dizziness 

.259 Crying 

.242 Nightmares 

.239 Nervous 

.233 Worrying 

.230 Insomnia 

.228 Constipation 

.223 Fears own impulses 

.221 Breathing diffi- 
culty 

.210 Compulsions 

.209 Shy 

.183 Overtired 

.166 Self-conscious 

161 Confused 

.150 Fantastic thinking 

.149 Inappropriately in- 
different 

.142 Ties 

.197 Skin eruptions 

.115 Feelings of worth- 
lessness 

.111 Obese 


end of first principal 
factor) 


.562 Disobedient 

.512 Lying 

447 Stealing 

.387 Fighting 

.323 Running away 

.320 Swearing 

.317 Quarrelsome 

.281 Threatening people 
.267 Truancy 

.249 Destructive 

.227 Poor school work 
.224 Attention demand- 


ing 
.224 Sexual delinquency 
.212 Inadequate 


guilt 
feelings 


.184 Sexual preoccupa- 


tion 
476 Thumbsucking 


174 Masturbation 
.165 Enuresis 

.155 Temper tantrums 
.148 Negativistic 


.147 Nailbiting 


.115 Hyperactive 


106 Poor motor co- 
ordination 


(b) Second principal factor 
Severe and Diffuse Psychopathology (unipolar) 


A78 Withdrawn 

406 Bizarre behavior 
.388 Confused 

.381 Depression 

.366 Ideas of reference 
.352 Crying 

.338 Fearful 

330 Fears own impulses 
328 Compulsions 

.328 Can't concentrate 
.324 Insomnia 

.313 Daydreaming 


.310 Temper tantrums 

306 Fantastic thinking 

.296 Obsessions 

.288 Negativistic 

.270 Moodiness 

.265 Destructive 

.262 Hyperactive 

.233 Seclusive 

.233 Feelings of worth- 
lessness 

.229 Phobias 

221 Excessive talking 


out of the 22 symptoms with loadings above 
900 on the larger pole were also loaded 
above .200 on the pole of the factor from the 
entire sample. The factor from the entire 
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TABLE 4 
FEMALE RorATED FACTORS 


1. Somatic Complaints (oblimin 5-3) 
.674 Headaches 
.599 Stomachaches 
.597 Nausea 
.566 Pains 
.493 Vomiting 
493 Dizziness 
.450 Breathing difficulty 
.423 Diploplia 
.307 Inappropriately indifferent 
.305 Overtired 
.297 Fainting 
173 Ties 
.152 Refusing to eat 
.150 Constipation 
.144 Phobias 
.135 Nervous 
2. Delinquent Behavior (quartimax 6, negative 
end) 
710 Lying 
710 Stealing 
.391 Inadequate guilt feelings 
.297 Disobedient 
.274 Masturbation 
.237 Attention demanding 
.222 Nailbiting 
.220 Truancy 
.136 Destructive 
.133 Fighting 
112 Temper tantrums 
3, Obsessions, Compulsions, and Phobias (oblimin 
7-2) 
.590 Fears own impulses 
.577 Obsessions 
.546 Compulsions 
453 Phobias 
.390 Crying 
.365 Refusing to eat 
.358 Feelings of worthlessness 
.346 Depression 
.343 Confused 
.291 Fearful 
.280 Insomnia 
.279 Withdrawn 
.255 Worrying 
.225 Moodiness 
.192 Nightmares 
.180 Refusing to talk 
.167 Constipation 
4. Schizoid Thinking and Behavior (oblimin 10-8) 
.662 Fantastie thinking 
.628 Bizarre behavior 
410 Confused 
401 Ideas of reference 
.366 Negativistic 
.829 Seclusive 
.255 Apathy 
.240 Fighting 
.182 Destructive 
179 Fearful 


6. 


.173 Poor school work 
.172 Headaches 

.170 Poor motor coordination 
.162 Withdrawn 

150 Daydreaming 

-146 Insomnia 

.137 Sexual preoccupation 
.128 Refusing to talk 

.124 Constipation 

.123 Encopresis 

.120 Obsessions 


. Aggressive Behavior (quartimax 22, negative 


end) 

.666 Swearing 

.625 Threatening people 
.583 Fighting 

.555 Destructive 

441 Disobedient 

.312 Temper tantrums 
.208 Suicidal 

179 Truancy 

.162 Attention demanding 
.158 Quarrelsome 
Hyperreactive Behavior (oblimin 4-4) 
.582 Hyperactive 

487 Can't concentrate 
.343 Nervous 

.323 Ties 

.313 Attention demanding 
295 Thumbsucking 

274 Nailbiting 

.274 Excessive talking 
.230 Destructive 


. Depressive Symptoms (oblimin 11-9) 


.650 Depression 

-516 Moodiness 

458 Withdrawn 

447 Suicidal 

.364 Temper tantrums 

.279 Crying 

.262 Refusing to eat 

.226 Running away 

.220 Shy 

.212 Fearful 

.210 Compulsions 

.205 Complains no one loves him 
-184 Self-conscious 

.149 Daydreaming 

140 Breathing difficulty 

.130 Feelings of worthlessness 
.129 Refusing to talk 


. Neurotie and Delinquent Behavior (oblimin 


9-6) 
430 Truancy 


-362 Poor school work 
.358 Running away 

-329 Overtired 

322 Sexual delinquency 
.317 Asthma 

-304 Disobedient 
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TABLE 4—Continued 
.295 Suicidal .359 Phobias 
.243 Skin eruptions .347 Worrying 


207 Depression 
204 Fainting 
150 Breathing difficulty 
.135 Poor motor coordination 
.134 Daydreaming 
.129 Confused 
.123 Swearing 
.108 Can't, concentrate 
105 Apathy 
102 Lying 
9. Obesity (oblimin 5-5) 
666 Obese 
.629 Self-conscious 
.607 Overeating 
.392 Shy 
.324 Withdrawn 
.312 Depression 
.263 Daydreaming 
.217 Loneliness 
.199 Complains no one loves him 
.192 Suicidal 
.154 Pains 
.140 Nausea 
.133 Overtired 
10. Anxiety Symptoms (oblimin 9-4) 
.535 Insomnia 
.381 Crying 


340 Skin eruptions 

.325 Nightmares 

.297 Asthma 

.266 Vomiting 

.260 Nausea 

.236 Picking 

.186 Nervous 

73 Destructive 

.137 Thumbsucking 

.121 Fearful 

.107 Poor motor coordination 

.105 Refusing to eat 
11. Enuresis and Other Immaturities (oblimin 

14-3, negative end) 

490 Enuresis 

291 Thumbsucking 

.263 Refusing to eat 

.249 Stuttering 

.203 Encopresis 

195 Shy 

.165 Masturbation 

.151 Apathy 

.146 Destructive 

129 Refusing to talk 

.123 Overeating 

.121 Constipation 

.109 Temper tantrums 


sample is presented in Table 3. The third 
and succeeding factors had relatively small 
eigenvalues compared to the first two prin- 
cipal factors (4.479 for the first, 3.321 for 
the second, 2.647 for the third, and 2.375 for 
the fourth). 

Rotated factors. The same rotational and 
selection procedures used with the males’ 
data yielded 11 factors which were consid- 
ered reliable (Table 4). Five of these were 
agreed by the clinical consultants to re- 
semble reliable factors found for the males, 
and they were therefore given the same de- 
scriptive labels. These were: 1. Somatic 
Complaints; 2. Delinquent Behavior; 3. 
Obsessions, Compulsions, and Phobias; 4. 
Schizoid Thinking and Behavior; and 5. Ag- 
gressive Behavior. Factor 6 presented among 
its more heavily loaded items a pattern 
somewhat similar to that of the male factor 
labeled Hyperreactive Behavior and was 
given the same label, with the qualification 
that more symptoms from the Internalizing 
group were present. Factor 7 was agreed to 
be well described by the label Depressive 
Symptoms. Factor 8 was agreed to include 


both symptoms commonly labeled neurotic 
and behaviors labeled delinquent. Some res- 
ervations about the label Neurotic and De- 
linquent Behavior were expressed, but, since 
no other label was suggested, this was re- 
tained. Obesity was agreed to be the most 
applicable single descriptive term for Fac- 
tor 9, although it was suggested that the 
other items on the factor were also descrip- 
tive of social inadequacy and the oral de- 
pressive syndrome. Because it was difficult, 
to choose a term to describe this aspect of 
the factor, the label Obesity was applied 
with the qualification that the factor in- 
cluded several symptoms and behaviors be- 
side the physical condition of obesity. 
There was consensus that Factor 10 was 
composed mainly of symptoms usually at- 
tributed to anxiety, so it was labeled Anxi- 
ety Symptoms. The labeling of Factor 11 
presented considerable difficulty. It was 
pointed out that it included several symp- 
toms often listed under Special Symptom 
Reaction in the Standard Nomenclature. It 
was also pointed out that the items reflected 
a common immaturity or lack of control in 
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regulatory functions. Enuresis was by far 
the most heavily loaded symptom on the 
several variations of the factor. It bore 
some similarity to the unnamed Factor 8 
found in the male analyses. Since enuresis 
was clearly the most prominent item, and 
there was some agreement as to the imma- 
ture quality of the other items, the label 
Enuresis and Other Immaturities was fi- 
nally selected. 

Second-order factors. As was done with 
the male data, the four- and eight-factor 
oblimin solutions for the female data were 
subjected to second-order orthogonal analy- 
ses. The two second-order orthogonal fac- 
tors obtained from the four oblimin factors 


ROTATED FACTORS 


were very similar in the principal factor, 
varimax, and quartimax solutions. The two 
largest second-order factors obtained from 
the eight oblimin factors were also very 
similar for the three orthogonal solutions, 
while the third orthogonal factor was con- 
sistent only with respect to the most heavily 
loaded first-order factors. The largest sec- 
ond-order principal factor for both the four- 
and eight-factor oblimin solutions was bi- 
polar and resembled the dichotomy found 
in the first-order first principal factor. The 
second-order second principal factor was 
unipolar in both cases and showed some re- 
semblance to the first-order second principal 
factor. 


ROTATED FACTORS 


FIRST PRINCIPAL FACTOR 


NEUROTIC AND 
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Fig. 3. Classification of female Ss by first principal and rotated factors. (One “obsessive” and one 
“anxiety” 8 came from the Unclassified group; one *'enuresis" S came from the Externalizing group; 
one “enuresis” and one “neurotic and delinquent” S came from the Internalizing group.) 
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Classification of Ss by the factors. For the 
reasons advanced earlier, it was decided 
that the first-order first principal factor was 
the best representative of the most primary 
dimension in the matrix of symptom inter- 
correlations. Replication of the classifica- 
tion procedure employed with the males re- 
sulted in the assignment of 143 females to 
the Internalizing category, 63 to the Exter- 
nalizing category, and 94 to the Unclassified 
group. Figure 3 presents the number of cases 
assigned by the same method to each of the 
categories represented by the 11 rotated fac- 
tors and the relationship between classifica- 
tion by the first principal factor and by the 
rotated factors. 

The second principal factor was also used 
to classify cases according to the 60% cri- 
terion, and 49 cases were found to meet this 
criterion. Figure 4 indicates that classifica- 
tion by the first principal factor was roughly 
orthogonal to that by the second principal 
factor. 


First Principal Factor 
Second Principal Factor 


|" SEVERE. 
| AND 

DIFFUSE 
PSYCHOPAT! 


Fic. 4. Classification of female Ss by first and 
second principal factors. 


Relationship of Biographical Variables to 
Factorial Classifications” 


The Internalizing versus Externalizing 
classification would appear to offer the most 
meaningful comparisons on the biographical 
variables because of the mutually exclusive 
categories it provides and the large num- 
ber of Ss which fall into each category. For 
virtually all variables where Internalizers 
differed significantly from Externalizers, the 
distribution of Unclassified Ss fell midway 
between the other two. 

Since rough measures of “social compe- 
tence” have been found to be related to the 
symptom patterns of adult psychiatric pa- 
tients (Phillips & Zigler, 1961; Zigler & 
Phillips, 1960) and to personality variables 
in psychiatric and nonpsychiatrie patients 
(Achenbach & Zigler, 1963), the present 
study sought to analyse the relationship be- 
tween various social competence indicators 
and the juvenile symptom patterns found. 
The studies just cited usually employed the 
variables of occupation, age, education, 
marital status, employment history, and in- 
telligence as social competence indicators. 


However, there are clearly at least two gen- 


eral variables indicative of sociocultural at- 
tainment represented by this type of index. 
The occupation and education scores repre- 
sent the traditional concept of social class, 
while marital status and employment his- 
tory may be rough indicators of personality 
adjustment. Age and intelligence probably 
correlate with both of these general indi- 
cators of different types of sociocultural at- 
tainment. While the concept of social com- 
petence in developmental theory assumes 
that all six variables correlate in varying 
degrees with an underlying pattern of adap- 
tive adequacy, it was considered useful for 
the study of child symptoms to separate so- 
cial class from adaptive adequacy. Because 
the social class of the child would have been 
the product of his parents’ behavior and not 
his own, it could not be regarded as a meas- 
ure of his adaptive adequacy. 


SAll p values are for two-tailed tests; chi- 
square tests are presented in Table 5 and all 2 x 
2 chi-square values are corrected for continuity; t 
tests are presented in Table 6; Ns vary because 
not all records provided data on all variables. 
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TABLE 5 
Cnr-SquanE Tests 
Males Females 
Variable 
aliaa] am | > [aes Lees vail ar] ae | 
1. School performance 
Below average 17 62 79 | 2 |21.806| «.001| 20 28 48 | 2 | 19.657 |«.001 
Average 15 25 40 47 16 63 
Above average 13 3 16 33 7 40 
Total 45 90 | 135 100 51 | 151 
2. Child’s previous prob- 
lems 
(a) Number: 
None 54 45 99 | 1 [33.442| < .001| 108 29 | 137 | 1 | 17.590 |«.001 
1 or more 12 78 90 24 28 52 
"Total 66 | 123 | 189 132 57 | 189 
(b) Expelled from 
school 
No 65 93 | 158 | 1 |14.766)<.001) 131 49 | 180 expect. 
Yes 1 30 31 ri 8 9 f too 
Total 66 |123 |189 132 57 |189 small 
(c) Police 
No 64 88 |152 | 1 |16.058| € .001| 130 41 | 171 | 1) 29.569 |<.001 
Yes 2 35 37 2 16 18 
Total 66 | 123 | 189 132 57 | 189 
(d) Psychiatric 
No 64 | 112 |176 | 1 | 1.512| «.30 | 127 51 | 178 expect. 
Yes 2 11 13 5 6 11 f too 
Total 66 | 123 | 189 132 57 | 189 small 
(e) School failure 
No 55 83 | 138 | 1 | 4.704| €.05 | 113 50 |163|1 «1 n.s. 
Yes 11 40 51 19 7 26 
Total 66 | 123 | 189 132 57 | 189 
3. Number of parental 
social problems 
(a) Father 
0 35 37 72 | 2 | 8.674|<.02 | 51 14 65|2| 5.360 |<.07 
1 15 38 53 43 20 63 
More than 1 7 24 31 21 16 37 
"Total 57 99 | 156 115 50 | 165 
(b) Mother 
0 46 42 88 | 2 |19.797| «.001| 73 18 91 | 2 | 14.412 |<.001 
T 11 49 60 31 27 58 
More than 1 5 17 22 8 8 16 
Total 62 |108 |170 112 | 53 |165 
(c) Both parents com- 
bined! 
0 37 25 62 | 3 |27.844|<.001| 46 12 58 | 3 | 13.714 |«.01 
H 9 16 25 26 6 32 
2 10 | 52 | 62 32 | 20 | 52 
More than 2 9 27 36 20 19 39 
Total 65 | 120 | 185 124 | 57 | 181 
4, Parental problem cate- 
gories 
(a) Father 
Alcoholism 
No 45 71 .|116|1| <1 | ns. | 86 32 |118|1]| 1.495 |«.30 
Yes 12 28 40 29 18 47 
Total 57 99 | 156 115 50 | 165 
Divorce 
No 51 85 |136|1]| «1 | ns. | 105 35 | 140 | 1 | 10.702 |«.01 
Yes 6 14 20 10 15 25 
Total 57 | 99 | 156 15 | 50 | 165 
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TABLE 5—Continued 
Males Females 
Variable 
mahame making Totals] af | x2 | p | Unter, | Exter- Tou ay | x P 
Psychiatrie history 
No 52 80 | 132 | 1 | 2.270|<.20 | 90 45 |135|1]| 2.487 |<.20 
Yes 5 19 24 25 5 30 
Total 57 | 99 | 156 15 | 50 | 165 
Unemployment 
No 52 | 76 |128|1|4.201]«.05 | 96 | 41 |137|1| <1 | ns 
Yes 5 | 23 | 28 19 9 | 28 
Total 57 99 | 156 115 50 | 165 
(b) Mother: 
divorce 
No 55 77 |132|1] 5.915|c.02 | 97 37 | 134|1]| 5.596 |<.02 
Yes iG 31 38 15 16 31 
Total 62 | 108 | 170 112 53 | 165 
Illegitimate child 
No 59 94 | 152 | 1 | 2.109|«.20 | 106 49 | 155 expect. 
Yes 3 14 17 6 4 10 f too 
Total 62 | 108 | 170 112 53 | 165 small 
Psychiatric history 
No 53 82 |135 |1 | 1.655|<.20 | 86 40 |126 |1 «1 ns. 
Yes 9 26 35 26 13 39 
Total 62 | 108 | 170 112 53 | 165 
5. Lives with 
Both natural par- 
ents 52 76 |128 | 1 | 4.999|«.05 | 97 33 | 130|1]| 3.845 |<.05 
Other 16 52 68 46 30 76 
"Total 68 | 128 | 196 143 63 | 206 
6. Parents' attitude to- 
ward child's problem 
(a) Father 
Concerned 38 49 87 | 1 | 6.812;}<.01 | 65 22 87} 1] 9.885 |«.01 
Resentful or indif- 
ferent 12 45 57 21 25 46 
Total 50 94 | 144 86 47 |133 
(b) Mother 
Concerned 6l 87 |148 | 1 | 9.202|<.01 | 117 40 |157 |1 | 14.153 |«.001 
Resentful or indif- 
ferent 4 30 34 13 20 33 
Total 65 |117 |182 130 60 |190 
7. Number of siblings 
(a) 0 4 5 9 | 6 |11.359| «.10 7 3 10|6| 6.923 |<.50 
1 17 15 32 33 14 AT 
2 13 27 40 28 11 39 
3 16 35 51 25 18 43 
4 11 15 26 15 7 22 
5 4 1l 15 10 6 16 
More than 5 3 20 23 25 4 29 
Total 68 | 128 | 196 143 63 | 206 
(b) 0 or 1 21 20 41 | 1 | 5.361)<.05 | 40 17 57 | 1 <1 n.8. 
More than 1 47 |108 | 155 103 45 | 148 
"Total 68 |128 | 196 143 62 | 205 
8. Birth order 
First 25 47 72|3| <1 | ns. | 47 23 70|3| 2.905 |«.50 
Second. 23 40 63 43 17 60 
Middle 14 31 45 39 12 51 
Last(exceptsecond)| 6 10 16 14 10 24 
"Total 68 | 128 | 196 143 62 | 205 
9. Hometown size 
0-1000 5 17 22 | 3 | 2.728)<.50 | 20 4 24/3) 8.704 |<.05 
1001-5000 7 12 19 30 9 39 
5001-25,000 12 29 41 29 8 37 
Above 25,000 44 70 |114 64 42 |106 
Total en | 192 | 196 143 63 | 206 
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For both the children and their parents, 
rough measures of adaptive adequacy were 
recorded. For the children, school perform- 
ance was one obvious indicator of adaptive 
adequacy. If the child was regularly at- 
tending school, reports of his school per- 
formance were recorded in the categories of 
“below average," “average,” and “above 
average.” A chi-square comparing these 
three categories revealed that Internalizers 
of both sexes were performing significantly 
better in school (x2 = 21.806, df = 2, 
p < .001 for males; x? = 19.657, df = 
2, p « .001 for females). Comparisons 
of IQs were also made by using full scale 
Wechsler Intelligence Scale for Children 
(WISC) scores when available or Stanford- 
Binet scores in lieu of the WISC. There was 
no significant difference in IQ between In- 
ternalizers and Externalizers of either sex, 
although the mean IQs for Internalizers of 
both sexes were higher and the male differ- 
ence nearly reached significance (t = 1.939, 
df = 135, p < .10 for males; ¢ < 1, df = 
125, p = n.s. for females). A further analy- 
sis of IQ scores was made by comparing 
the verbal and performance scores on the 
WISC. For males, Internalizers had signifi- 
cantly higher verbal IQ scores than did Ex- 
ternalizers (t = 2.119, df = 89, p < .05), 
but there were no significant differences be- 
tween the verbal scores for female Inter- 
nalizers and Externalizers, nor between the 
performance scores of the two groups for 
either sex. A direct test of the difference be- 
tween S's verbal and performance scores 
showed male Externalizers to have signifi- 
cantly higher performance than verbal 
scores (t = 4.099, df = 59, p < .001), while 
female Internalizers also had significantly 
higher performance than verbal scores (t = 
3.317, df = 69, p < .01). There were no sig- 
nificant differences between performance 
and verbal scores for male Internalizers or 
female Externalizers. 

The number of different categories of so- 
cial problems which S had been reported to 
manifest constituted an additional index of 
adaptive adequacy. The categories of prob- 
lems recorded were: (a) trouble with the 
police and being brought to court, (b) pre- 
vious psychiatrie referral, (c) being expelled 
from school, and (d) failing a grade in 


school. No measure of the degree of diffi- 
culty in any of the categories was at- 
tempted, but it was assumed that the num- 
ber of categories in which entries occurred 
would contribute a rough measure of the 
child's previous social adequacy. As the Ex- 
ternalizing pole that appeared in the factor 
analyses included several behaviors which 
could have resulted in entries in some of the 
problem categories, the basis for this index 
was not entirely independent of the Inter- 
nalizing-Externalizing classification. Its 
problem categories may be best regarded, 
perhaps, as general consequences of the type 
of adaptive pattern of which the Externaliz- 
ing symptoms are specific elements. Inter- 
nalizers of both sexes had far fewer of these 
problems than did Externalizers (y? = 
33.442, df = 1, p < .001 for males; x? = 
17.590, df — 1, p « .001 for females). For 
all categories and each sex, the Externalizers 
had proportionally more entries than did 
the Internalizers, although not all the dif- 
ferences were significant. For the males, the 
differenees were significant in three of the 
categories (police, x? = 16.058, df = 1, 
p < .001; school failure, x? = 4.704, df = 1, 
p < .05; expelled from school, x? = 14.766, 
df = 1, p < .001). In the fourth category, 
previous psychiatric referral, the expected 
frequency in one cell reached only 4.54, so 
the x? of 1.512 is of questionable validity, 
but is far from significant. For the females, 
frequencies in the psychiatric and expelled 
categories were too small for analysis. Ex- 
ternalizers had significantly more trouble 
with the police (x? = 29.569, df = 1, p < 
.001), but there was not a significant differ- 
ence in the school failure category (x? < 1). 

The adaptive adequacy index for the par- 
ents was composed of the following problem 
categories: (a) divorce, (b) psychiatric 
history, (c) criminal record, (d) frequent 
excessive use of alcohol, (e) unemployment, 
(f) having an illegitimate child, (g) deser- 
tion of family, and (A) being charged with 
neglect of children. Chi-square tests re- 
vealed that fathers of Externalizers of both 
sexes manifested more of these problems, 
but the difference for females did not quite 
reach statistical significance (x? = 8.674, 
df = 2, p < .02 for the males; x? = 5.360, 
df = 2, p < .07 for the females). Mothers 
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of Externalizers of both sexes also mani- 
fested more problems, the difference being 
highly significant in each case (x? = 19.797, 
df = 2, p < .001 for males; x? = 14.412, 
df = 2, p < .001 for females). The sum of 
problems for both parents, or twice the score 
of one parent if there were no data for the 
other parent, showed the parents of Ex- 
ternalizers of both sexes to have signifi- 
cantly more problems (x? = 27.844, df = 
3, p < .001 for males; x? = 13.714, df = 3, 
p < .01 for females). 

A breakdown of the specific categories of 
problems showed that fathers of male Ex- 
ternalizers tended to have proportionately 
more entries in all categories except that of 
illegitimate children. However, these dif- 
ferences were significant only for the cate- 
gory of unemployment (x? = 4.201, df = 1, 
p < .05) and nonsignificant for divorce, al- 
cohol, and psychiatric history. In the cate- 
gories of criminal history, illegitimate chil- 
dren, desertion, and neglect, the frequencies 
were too small for valid chi-square analysis. 
The fathers of female Externalizers tended 
to have proportionally more entries in the 
categories of divorce, criminal history, al- 
cohol, desertion, and neglect. The difference 
was significant only for the category of 
divorce (x? = 10.702, df = 1, p < .01) and 
nonsignificant for the category of alcohol- 
ism. Frequencies were too small for chi- 
square analysis in the categories of criminal 
history, illegitimate children, desertion, and 
neglect. 

Mothers of Externalizers of both sexes 
tended to have proportionally more entries 
in the six categories where mothers had any 
entries at all (there were none in the crimi- 
nal or unemployment categories). These 
differences were significant for the divorce 
category (x? = 5.915, df = 1, p < .02 for 
males; x? = 5.596, df = 1, p < .02 for fe- 
males), but nonsignificant for psychiatric 
history for the mothers of both sexes and 
for illegitimate children for the mothers of 
males. All other categories contained fre- 
quencies too small for chi-square analysis. 

A comparison of the persons with whom 
Internalizers and Externalizers were resid- 
ing showed that Internalizers of both sexes 
were more frequently living with both natu- 


ral parents than were Externalizers (y? = 
4.999, df = 1, p < .05 for males; x? = 3.845, 
df = 1, p < .05 for females). Where suffi- 
cient data were available, the attitude of 
each parent or parent surrogate toward 
having the child's problem treated was rated 
in the categories of: (a) resentful; (b) in- 
different; and (c) concerned. Chi-square 
comparisons showed that both parents of 
Internalizers of both sexes were more often 
rated “concerned” than were parents of Ex- 
ternalizers (x? = 6.812 for fathers, and 
xi = 9.202 for mothers, both df = 1, p < 
01 for males; x? = 9.885, df = 1, p < .01 
for fathers, x? = 14.153, df = 1, p < .001 for 
mothers of females). 

A t-test comparison of the age of each 
parent when the child was born showed 
that both parents of male Internalizers were 
significantly older than those of male Ex- 
ternalizers (t = 2.812, df = 179, p < .01 
for fathers; t = 6.885, df = 187, p < .001 
for mothers, but that there was only a non- 
significant trend in the same direction for 
females (t = 1.549, df = 185, p < .20 for 
fathers; t = 1.356, df = 192, p < .20 for 
mothers). 

In order to examine the relationship be- 
tween the social class component of social 
competence and the symptom patterns, a 
measure of parental social class obtainable 
from the case histories was necessary. In 
the Zigler-Phillips studies, the six social 
competence variables were scaled on equiv- 
alent scales. Those variables for which data 
were present in a given case history were 
averaged to yield the social competence 
score. For present purposes, where only a 
social class score rather than an overall so- 
cial competence score was sought, a some- 
what similar approach was taken. The six- 
step scale for education and a modification 
of the six-step scale for occupation em- 
ployed by Zigler and Phillips (1960), with 
occupation being classified by the Diction- 
ary of Occupational Titles (United States 
Government, 1949), were used to calculate 
the social class score for the head of the 
household. The education and occupation 
scores were averaged to yield the social 
class score. Where only one variable was 
reported, it provided the social class score. 
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Cases where neither occupation nor educa- 
tion were given, but where the family de- 
pended upon public welfare assistance were 
automatically assigned to the lowest cate- 
gory. Table B1 presents the scales and the 
distribution of social class scores obtained 
by this method (Social Class Index I). 
There was a nonsignificant tendency for 
Internalizers of both sexes to have higher 
social class scores than did Externalizers 
(t = 1.685, df = 193, p < .10 for males; 
t = 1.218, df = 196, p < .20 for females). 

The scaling approach just described as- 
sumed that the averaging of available data 
gave roughly consistent estimates of social 
class standing, and it resulted in a rela- 
tively continuous distribution of scores. An- 
other approach to social class scaling is by 
attempting to assign individuals to a few 
discrete strata. Hollingshead and Redlich 
(1958), for example, employed the variables 
of occupation, education, and neighborhood 
to assign individuals to one of five classes 
which were thought to exist in the com- 
munity. To see if this approach would yield 
results different from those reported above, 
cases were classified using a modification of 
Hollingshead's “Two-Factor Index of So- 
cial Position" (1957). Normally, the two 
scores from Hollingshead's seven-step occu- 
pation scale and seven-step education scale 
are averaged, with occupation being given 
a weight of seven and the education score a 
weight of four. To obtain a small number of 
discrete social classes, only an occupation 
score (found by Hollingshead to be the best 
predictor of social class) was employed 
here; when it was not available, the educa- 
tion score was used instead; if both were 
unknown and the family was on welfare, the 
case was assigned to the lowest class. Except 
for the placement of service workers and the 
combining of the top two occupational and 
educational classes into one, the six-step 
scales presented with Table B1 resemble the 
Hollingshead scales rather closely. There- 
fore, in this second social class scaling pro- 
cedure (Social Class Index II), the occupa- 
tional and educational categories of Table 
B1 were followed, but the specific occupa- 
tional roles of service workers were scored 
according to the Hollingshead Index. For 


example, a practical nurse, classified as a 
personal service worker by the Dictionary 
of Occupational Titles, would have been as- 
signed to Category 4 by the scale in Table 
Bl, but was listed by Hollingshead under 
the same class heading as “Machine Opera- 
tors and Semi-skilled Employees." Several 
other specific occupational roles listed as 
service workers, for example, policemen, 
firemen, nightwatchmen, janitors, and wait- 
ers, were also classified differently under the 
Hollingshead Index. As with Social Class 
Index I, there were nonsignificant tendencies 
for Internalizers of both sexes to have the 
higher social class scores (¢ = 1.587, df = 
193, p < .20 for males; t = 1.225, df = 196, 
p > .20 for females). 

The distributions of diagnoses are pre- 
sented in Table B3. Examination of the re- 
maining biographieal variables showed the 
following results: Internalizing males were 
significantly older than Externalizing males 
(t = 2.943, df = 194, p < .01), but there 
was not a significant difference in age be- 
tween Internalizing and Externalizing fe- 
males (t < 1, df = 204, p = n.s.; age distri- 
butions are presented in Table B2); there 
was a tendency, approaching significance, 
for Externalizing males to have more sib- 
lings than did Internalizing males, but the 
difference for the females was in the oppo- 
site direction and insignificant (y? = 11.359, 
df = 6, p < .10 for males; x? = 6.923, df = 
6, p < .50 for females); 2 x 2 chi-squares 
comparing the number of cases who had 
zero or one sibling with those having more 
siblings showed that Externalizing males 
more frequently had more than one sibling 
Ge = 5.361, df = 1, p < .05; x? < 1, df = 
1, p = n.s. for females) ; there were no sig- 
nificant differences in birth order for either 
sex (x? < 1, df = 3, p = ns. for males; 
x2 = 2.905, df = 3, p < .50 for females) ; 
there was no significant difference in home- 
town size for the two male groups (y? = 
2.728, df = 3, p < .50), but female Exter- 
nalizers came from larger towns than did 
female Internalizers (x? = 8.704, df = 3, 
p < .05) ; there appeared to be no consistent 
religious differences in the groups and no 
grounds for combining low frequency cate- 
gories to make a chi-square test possible. 
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No independent analysis of the back- 
ground data for Ss classified by the second 
principal factor was attempted because of 
the general weakness of that factor as com- 
pared to the first principal factor. Inspec- 
tion of biographical data for Ss classified by 
the rotated factors? revealed that the fe- 
male Factor 11, Enuresis and Other Imma- 
turities, represented only very young chil- 
dren. Five of the seven Ss were 4 years old 
and two were 5, yielding a mean age of 4.29. 
For both sexes, the Hyperreactive Behavior 
factor also classified predominantly younger 
Ss, the mean ages being 7.33 for the females 
and 7.80 for the males. The Ss classified by 
Factor 3, Obsessions, Compulsions, and 
Phobias, had the highest mean IQ for each 
sex (111.89 for males, 114.67 for females), 
while Ss classified by the Aggressive Behav- 
ior factor had the lowest mean IQ, for the 
females (86.50) and the second lowest for 
the males (95.71). The same relationships 
held for social class, with the Obsessive Ss 
being the highest (M = 4.46 for males; 
M = 3.96 for females), and the Aggressive 
Ss being the lowest for each sex (M = 3.33 
for males; M = 2.50 for females). Finally, 
the parents of the Aggressive Ss had the 
highest mean numbers of social problems in 
each sex group (3.00 for males; 2.50 for fe- 
males). 


Discussion 


With regard to the first purpose of the 
study, the data indicate that child psychi- 
atric symptoms do indeed form both general 
clusters like those found for adults by Phil- 
lips and Rabinovitch and Guertin, and for 
children by Hewitt and Jenkins, and more 
specific clusters like those implied by tra- 
ditional diagnostic categories and found for 
adults by Wittenborn. The types of symp- 


° Two one-page tables presenting mean age, IQ, 
parental problems, and social class scores of Ss 
classified by the male and female rotated factors 
have been deposited with the American Docu- 
mentation Institute. Order Document No. 8848 
from ADI Publications Project, Photoduplication 
Service, Library of Congress, Washington, D. C. 
20540. Remit in advance $125 for microfilm or 
$1.25 for photocopies and make checks payable to: 
Chief, Photoduplication Service, Library of Con- 
gress. 


toms falling at the Internalizing pole of the 
first principal factor for both sexes are cer- 
tainly consistent with those of Hewitt and 
Jenkins’ Overinhibited Child cluster and, 
except for the obvious developmental dif- 
ferences, the symptoms of Phillips and Rab- 
inovitch's “self-deprivation and turning 
against the self” cluster and Guertin’s Guilt- 
Conflict cluster. Likewise, the symptoms at 
the Externalizing pole of the first principal 
factor for both sexes are consistent with the 
Hewitt-Jenkins Socialized Delinquent and 
Unsocialized Aggressive clusters, the Phil- 
lips-Rabinovitch “self-indulgence and turn- 
ing against others” cluster, and the Guertin 
Excitement-Hostility cluster. 

Many of the rotated factors which were 
considered reliable resemble traditional cat- 
egories and the factors found by Witten- 
born. The Somatic Complaints, Obsessions, 
Compulsions, and Phobias, and Schizoid 
Thinking and Behavior factors found for 
both sexes are reminiscent of Wittenborn’s 
“conversion hysteria,” “phobic compulsive,” 
and “schizophrenic excitement” syndromes. 
The Anxiety Symptoms and Depressive 
Symptoms factors found here for the girls 
resemble his “acute anxiety” and “depressed 
state” syndromes, respectively. Since his 
sample excluded sociopaths, syndromes cor- 
responding to the present Delinquent Be- 
havior and Aggressive Behavior factors 
would not have been expected to occur, but 
these may well be the child counterparts of 
the diagnostie categories of Dyssocial Re- 
action and Antisocial Reaction, respec- 
tively. The Enuresis and Other Immaturi- 
ties factor found for the girls is evidently 
a syndrome peculiar to early childhood 
(occurring only in 4- and 5-year olds in the 
present sample), and would not be expected 
in adult populations. The Hyperreactive 
Behavior factor also classified only young 
children in the present sample and may be- 
long to an early developmental stage or may 
be the early sign of what is later recognized 
as organie dysfunction, causing such indi- 
viduals to be then excluded from functional 
categories. 

The Obesity factor classified only girls 
aged 10-14 and might therefore be regarded 
as a developmental phenomenon associated 
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with puberty and unlikely to occur inde- 
pendent of other syndromes in an adult 
sample. Likewise, the vast majority of Ss 
classified by the Neurotic and Delinquent 
Behavior factor were between the ages of 12 
and 15, suggesting that this too is a phenom- 
enon belonging to a specific developmental 
period and not to be expected in patients 
from other age groups. In passing, it is to be 
noted that in the male data factors similar 
to the female Depressive Symptoms and 
Obesity factors occurred in several rota- 
tions, but that they failed to meet the cri- 
terion of reliability employed. While the 
factors concluded to be reliable can prob- 
ably be regarded with a good deal of con- 
fidence, they cannot necessarily be re- 
garded as exhausting the child-symptom 
domain. The evidence indicates however, 
that general symptom clusterings, here la- 
beled Internalizing versus Externalizing, 
and specific syndromes like several of the 
traditional syndromes of adult diagnosis ex- 
ist in the child domain. In addition, there is 
evidence for several syndromes which are 
apparently peculiar to specific childhood 
age periods and which are not recognized 
in adult diagnosis. 

The discovery of numerous reliable fac- 
tors beside the major Internalizing-Exter- 
nalizing dichotomy means that the second 
purpose of the study, that of obtaining a 
more differentiated operational classifica- 
tion schema for research purposes, has been 
in part fulfilled. The factorial results indi- 
cate that there are indeed several discrete 
and reliable clusterings of symptoms. The 
best means for classifying patients using 
these clusterings will depend upon the exact 
research purposes for which such a classi- 
fication schema is to be used and the heuris- 
tic value which it is found to possess. It 
would appear that the major Internalizing- 
Externalizing dichotomy can be readily 
used for classifying both case histories and 
live patients in the same way as it was used 
here. If 60% or more of a case’s symptoms 
came from the Internalizing cluster reported 
in Table 1 for males or Table 3 for females, 
the case would be assigned to the Internal- 
izing category, and likewise for the Exter- 
nalizing category. If only a few cases were 


available, a more liberal criterion, such, as 
a simple majority of symptoms, could be 
employed to classify them. If subtle differ- 
ences were being investigated, a more rigor- 
ous criterion, such as 75% or 100%, could 
be employed. 

The classification of cases by the retated 
factors in Tables 2 and 4 could also be done 
by the 60% criterion used here. However, 
since the Ns classified here by some of the 
rotated factors were quite small, a more lib- 
eral eriterion might generally be needed. 
An obvious alternative to classification by 
means of the percentage of symptoms 
matching a factor would be to assign Ss to 
a category if they manifested all or most of 
the most heavily loaded symptoms on the 
factor. Another approach would be to clas- 
sity Ss by the Internalizing and Externaliz- 
ing clusters, and also by those rotated. fac- 
tors which were not clearly subsumed by the 
Internalizing and Externalizing clusters, for 
example, the Schizoid and Hyperreactive 
factors. 

Regarding the third purpose of the study, 
the classification of individual cases by their 
degree of resemblance to both the rotated 
and unrotated factors revealed similar rela- 
tionships among the groupings for both 
sexes (see Figures 1 and 3). All Ss classified 
by the Aggressive Behavior and Delinquent 
Behavior factors had already been classi- 
fied by the Externalizing end of the first 
principal factor. Virtually all Ss classified 
by the Somatic Complaints and Obsessions, 
Compulsions, and Phobias factors had al- 
ready been classified by the Internalizing 
end of the first principal factor. Individuals 
classified by the Hyperreactive Behavior 
factor came from both the Externalizing 
and Unclassified groups of the first princi- 
pal factor, while individuals classified by 
the Schizoid factor tended to come from the 
Internalizing and Unclassified groups, with 
a few males coming from the Externalizing 
group. For both sexes then, the Aggressive 
and Delinquent factors may be regarded as 
representing subtypes within the general 
Externalizing category, while the Somatic 
and Obsessive factors represent subtypes 
within the general Internalizing category. 
The Hyperreactive and Schizoid factors 
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may be regarded as not being clearly sub- 
sumed by the categories of the first princi- 
pal factor, although the Hyperreactive fac- 
tor clearly excludes Internalizers and the 
Sehizoid factor tends to exclude External- 
izers, Of the factors peculiar to the females, 
the Anxiety factor appears clearly sub- 
sumed by the Internalizing category, while 
the Depressive and Obesity factors are not 
clearly subsumed by it but exclude Exter- 
nalizers. The Neurotic and Delinquent fac- 
tor, on the other hand, is not clearly sub- 
sumed by the Externalizing category, but 
tends not to include Ss classified as Inter- 
nalizers. Thus the relationship between the 
classification of Ss by the general categories 
and by the discrete syndromes is in part an 
hierarchical one, with several of the discrete 
syndromes representing clear subcategories 
of the general clusters, while several other 
of the discrete syndromes appear related to 
two of the general groupings, but are not 
neatly subsumed by a single one. The clas- 
sification of Ss by the second principal fac- 
tor for each sex was orthogonal to classifica- 
tion by the first principal factor (see Figures 
2 and 4) and was found to bear no consist- 
ent relationship to any of the rotated fac- 
tors. 

For both sexes, the Internalizing-Exter- 
nalizing dichotomy significantly discrimi- 
nated cases on many of the biographical 
variables. The findings that Internalizers 
were more frequently living with both nat- 
ural parents and that their parents had 
fewer overt social problems than the par- 
ents of the Externalizers corroborates the 
findings of Bennett with regard to neurotic 
and delinquent children and Hewitt and 
Jenkins with regard to Overinhibited versus 
Socialized Delinquent and Unsocialized Ag- 
gressive children. The greater frequency 
with which parents of Internalizers were 
rated “concerned,” as contrasted with “re- 
sentful” or “indifferent,” further suggests 
that the Internalizers’ parents took more 
responsibility for their children than did 
Externalizers’ parents. The findings that 
Internalizers of both sexes had significantly 
fewer previous social problems and signifi- 
cantly better school performance suggests 
that, like their parents, the Internalizers 


had been socially more adequate prior to 
their psychiatric referral than were the Ex- 
ternalizers. 

An interesting secondary finding was that 
Externalizing males and Internalizing fe- 
males had significantly higher performance 
than verbal IQ scores on the WISC. It has 
previously been found (e.g, Glueck & 
Glueck, 1950, 1959) that delinquent boys 
have significantly higher performance than 
verbal IQ scores, but evidently this rela- 
tionship has not been investigated for girls. 
If the finding that Internalizing girls have 
higher performance than verbal IQs is a 
reliable one, previous speculation about lack 
of symbolic ability in delinquent boys being 
related to their delinquent behavior may 
have to be reconsidered. 

The heuristic goal of the present study 
rests upon the interpretations of the em- 
pirical relationships discovered. To what 
extent can the observed relationships be- 
tween the two levels of factorial classifica- 
tion and the relationships between the fac- 
tors and the biographical variables lead to 
conceptual order and the eventual genera- 
tion of testable hypotheses? First, the 
grouping together of the Ss classified by the 
Somatic, Obsessive, and Anxiety Factors 
under the Internalizing category implies 
that these Ss have something in common 
which is reflected in the functional unity of 
the Internalizing cluster. On the other hand, 
the Aggressive and Delinquent Ss have 
something in common which is reflected in 
the functional unity of the Externalizing 
cluster. Second, the Hyperreactive, Schizoid, 
and Depressive factors appear not to be di- 
rectly related to the functional unities em- 
bodied in the first principal factor. Third, 
the second principal factor appears to be ir- 
relevant to the groupings defined by the first 
principal factor and the rotated factors. 

It is clear that the Externalizing symp- 
toms for both sexes represent behavior 
which is antisocial and which most people 
learn through negative sanctions not to per- 
form. For the behavior theorist, the obvious 
interpretation of the common feature in the 
Ss classified as Externalizers is that coun- 
terconditioning of antisocial behavior has 
not been successful. Bandura and Walters 
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(1959) concluded that one of the major dif- 
ferences between normal and aggressive 
adolescents was that antisocial behavior in 
normal adolescents was prevented by guilt 
feelings whereas aggressive adolescents 
showed an absence of guilt and were de- 
terred from misbehaving only when fear- 
arousing consequences were evident in the 
immediate situation. Bandura and Walters 
interpreted this to mean that successful so- 
cialization requires the substitution of less 
aggressive new responses for overt aggres- 
sion, rather than the displacement of the 
original responses onto new objects (p. 139). 
The findings of high frequencies of overt 
social problems and lack of parental con- 
cern in the family backgrounds of the Ex- 
ternalizers indicates that the social learning 
regimes they experienced probably did not 
provide the combination of reward contin- 
gencies and good role models which are 
necessary both to deter antisocial responses 
and to promote socialized responses. Fol- 
lowing this line of reasoning, one could con- 
clude that the deviant nature of the behav- 
ioral reactions manifested by Externalizers 
inheres in the absence of the proper learn- 
ing of socialized behavior. The deviant be- 
havior manifested by the Internalizers, on 
the other hand, presupposes the acquisition, 
through a socialization process, of behav- 
ioral reactions which are not antisocial. 

If the foregoing interpretation is ac- 
cepted, it can be said that the Aggressive 
and Delinquent syndromes simply repre- 
sent subvarieties of individuals whose so- 
cial learning regimes have not successfully 
eliminated antisocial behavior. The So- 
matic, Obsessive, and Anxiety syndromes 
represent individuals whose social learning 
regimes have promoted adaptive patterns 
which are more socialized. What then of the 
Unclassified Ss and those represented by the 
Schizoid, Hyperreactive, Neurotic and De- 
linquent, and Depressive syndromes? First, 
it would appear that some of the Unclassi- 
fied Ss have mixtures of Externalizing and 
Internalizing symptoms, whereas others 
tend to have symptoms which were loaded 
on neither the Externalizing nor Internaliz- 
ing poles. The Ss classified by the Enuresis 
and Other Immaturities faetor may be of 


this latter group, since nine of the 13 symp- 
toms on this factor had loadings smaller 
than +.200 on the principal factor and five 
of the seven Ss classified by it came from 
the Unclassified group, with one each from 
the Externalizing and Internalizing groups. 
Second, it would appear that some individ- 
uals who belong to the Schizoid, Hyperreac- 
tive, Neurotie and Delinquent, and Depres- 
sive groups clearly belong to either the 
Externalizing or  Internalizing groups, 
whereas others are from the Unclassified 
group. It might therefore be concluded that 
while some of these individuals have things 
in common with Internalizers or External- 
izers, the presence or absence of antisocial 
behavior is not a defining characteristic of 
their syndrome. For example, some of the 
Schizoid Ss may be Externalizers because 
the socialization process has not successfully 
eliminated antisocial behavior, while others 
are Internalizers because antisocial behav- 
ior has been successfully eliminated, but 
what they have in common is orthogonal to 
the Internalizing-Externalizing (or “social- 
ization”) dimension. Such a conclusion 
would be compatible with both organic and 
psychodynamic theories of schizophrenia 
which postulate a fundamental defect in the 
capacity for integrated behavior—the par- 
ticular behaviors manifested might be in- 
fluenced by social learning, but the under- 
lying functional unity which determines the 
schizoid patterning is independent of social- 
ization. 

If, for either psychological or organic 
reasons, the young children manifesting the 
Hyperreactive syndrome were unable to in- 
hibit impulse or to integrate their behavior, 
a variety of behavior might be expected to 
result which, while not readily suppressible 
by socialization, was not necessarily always 
antisocial. The Depressive syndrome classi- 
fies individuals from both the Internalizing 
and Unclassified groups. This may imply 
that, while not all Ss in this group are In- 
ternalizers, the condition is unlikely, per- 
haps because of general retardation in 
functioning, to be accompanied by much 
antisocial behavior. 

According to the above reasoning, the so- 
cial adjustment component of the social 
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competence concept would refer to the pres- 
ence of socialized adaptive patterns due to 
learning. Insofar as symptoms are purely 
“behavioral reactions," those manifested by 
children and adults who have been well 
socialized should differ from those who have 
not been socialized in that the former will 
not include antisocial behavior. On the 
other hand, the social class component of 
adult social competence may measure the 
resultant of both a socialized adaptive pat- 
tern and the intelligence and social oppor- 
tunities which are needed for various occu- 
pational levels. If socialized adaptive 
patterns and intelligence are both necessary 
to the maintenance of high social class 
standing by adults in a society which allows 
upward and downward mobility, the corre- 
lation between these two variables and so- 
cial class should increase with age, as an in- 
dividual’s social class standing comes to 
depend more upon his own behavior and less 
directly on that of his parents. In an adult 
psychiatric population, such as that em- 
ployed in the Phillips-Zigler studies, the in- 
tercorrelation of social class (occupation- 
education), social adjustment (marital 
status-employment history), age, and intel- 
ligenee should be high and may well reflect 
a unitary underlying pattern of adult adap- 
tive adequacy. However, in the present 
datà, the premorbid adequacy of children 
and their parents were significantly related 
to the Internalizing-Externalizing dimen- 
sion in symptoms, but intelligence of the 
child and social class of the parents showed 
only nonsignificant relationships to the In- 
ternalizing-Externalizing dimension. This 
suggests that the Internalizing symptom 
pattern and the prevention of antisocial be- 
havior depend more upon parents who 
themselves do not display antisocial behav- 
ior than upon the intelligence of the child 
or the social class correlates of his environ- 
ment. 

A final qualification to the above inter- 
pretation of the faetorial classification must 
be added. The factors are based only upon 
symptoms which are defined by relatively 
peripheral behavior, rather than by experi- 
ental or central variables. The factors are 
thus no more than collections of observed 


behaviors. Another approach to the statisti- 
eal classification of psychiatrie disorders 
has been by the intercorrelation of personal- 
ity test scores. Eysenck (cf. Eysenck, 1961, 
pp. 1-31) has pursued this approach most 
extensively and has repeatedly found two 
dimensions which he has labeled Introver- 
sion-Extraversion and Neuroticism. Psychi- 
atric patients diagnosed hysterical or psy- 
chopathie have been found to have factor 
scores high on Neuroticism and high on Ex- 
traversion; patients diagnosed anxious or 
obsessional have been high on Neuroticism 
and high on Introversion. Insofar as the 
present Somatie, Delinquent, Anxiety, and 
Obsessive Ss correspond to Eysenck’s diag- 
nostic groups, there is an evident contrast in 
that Somatie, Anxiety, and Obsessive Ss 
were found here to group together at what 
has been called the Internalizing pole of the 
first principal factor whereas Delinquent Ss 
fell at the Externalizing pole. Since the pres- 
ent data consisted of child symptoms and 
Eysenck's data were adult test scores, the 
findings are not directly contradictory. In 
fact, further consideration suggests that 
they may be complementary. If the symp- 
toms of these four groups are regarded as 
learned behavior, and the third dimension of 
successful socialization versus unsuccessful 
socialization is added to Eysenck’s two di- 
mensions, it can be seen that, while psycho- 
paths and hysterics would be similar in 
being extroverted, hysterics would be 
similar to anxiety and obsessional Ss in not 
manifesting antisocial behavioral reactions. 
There might thus be three independent di- 
mensions along which psychiatrie disorders 
of these four types could be classified: Ey- 
senck’s two dimensions assume genetic pre- 
dispositions (“inherited autonomie over- 
reactivity" for Neuroticism, and “strong 
conditionability” for Introversion, p. 21), 
while the present dimension involves a fun- 
damental distinetion in patterns of social- 
ization. 

Eysenck has also produced evidence for 
a psychoticism dimension in test scores. If 
such a dimension is regarded simply as one 
of disturbance not readily shaped by social- 
ization, it might be relevant to the Schizoid 
and Hyperreactive factors found here, or to 
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the second principal factor. Another pos- 
sible interpretation of the second principal 
factor is that it represents the intensity of 
disturbance and classifies very disturbed in- 
dividuals regardless of the type of syn- 
drome manifested. 


SUMMARY AND CONCLUSIONS 


Symptoms and biographical data were 
recorded from the case histories of 300 male 
and 300 female child psychiatric patients. 
The symptoms were intercorrelated and fac- 
tor analyzed, separately for each sex, by the 
principal factor method, and the factors 
were rotated to the varimax, quartimax, and 
oblimin criteria for simple structure. The 
first principal factor for both sexes was bi- 
polar, with antisocial behavior (“External- 
izing”) at one end and symptoms of inter- 
nal problems (“Internalizing”) at the other 
end. Several factors occurred repeatedly in 
the different rotations and were therefore 
considered reliable, For both sexes, factors 
given the labels of Somatie Complaints, 
Obsessions, Compulsions, and Phobias, De- 
linquent Behavior, Aggressive Behavior, 
Hyperreactive Behavior, and Schizoid 
Thinking and Behavior were found. For the 
boys alone, a factor labeled Sexual Prob- 
lems, and, for the girls alone, factors labeled 
Depressive Symptoms, Anxiety Symptoms, 
Neurotic and Delinquent Behavior, Enure- 
sis and Other Immaturities, and Obesity 
were found. 

Cases were classified according to their 
resemblance to the poles of the principal 
factor and to the rotated factors. If 60% or 
more of an S's symptoms came from a given 
factor, or pole of the principal factor, he 
was placed in the category represented by 
that factor or pole. This revealed that all 


S's classified by the Aggressive and Delin- 
quent rotated factors of both sexes were 
also elassified by the Externalizing pole of 
the principal factor. Almost all Ss classified 
by the Somatic and Obsessive rotated fac- 
tors of both sexes and the Anxiety factor of 
the girls were also classified by the Inter- 
nalizing pole of the first principal factor. 
Classification by the other rotated factors 
bore no consistent relation to one or the 
other pole of the principal factor. 

Because the parents of Externalizers were 
found to have significantly more overt so- 
cial problems and to be rated less concerned 
with their child’s difficulty, it was suggested 
that the Externalizing symptoms reflected 
a learning regime in the child’s home lead- 
ing to antisocial behavior, while the Inter- 
nalizing symptoms presupposed the learning 
of more socialized behavioral reactions. The 
Aggressive and Delinquent factors were 
concluded to be subvarieties of the general 
category of unsocialized behavioral reac- 
tions, while the Somatic, Obsessive, and 
Anxiety factors were concluded to be sub- 
varieties of the general category of social- 
ized behavioral reactions, The other rotated 
factors were concluded to represent func- 
tional unities which were relatively inde- 
pendent of the socialization dimension. 
Some of these appeared to be peculiar to 
specific developmental stages. 

The overall results showed that the fac- 
tors obtained can be used directly for the 
classification of child psychiatric cases for 
research purposes. The more general and 
more specific clusterings of symptoms may 
be employed independently or in an hierar- 
chieal ordering. The relationships among 
the clusterings suggested that different di- 
agnostic models may ultimately be appro- 
priate for different, syndromes. 


APPENDIX A 
SYMPTOMS (OCCURRING WITHIN THREE YEARS OF ADMISSION)^' 


M,F 1. Apathy, underactive, no initiative, 
slow, lethargic 
2. Assault with weapon 


“M and F indicate symptoms which occurred 
five or more times in the male and female samples, 
respectively. 


. Asthma 
. Attention demanding 


M,F 3 
4. 
5. Believes he is evil 
6. 
7 
8 


; 
. Bizarre behavior 


. Breathholding 
. Breathing difficulty 


M,F 
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. Can’t concentrate, distractible, short M, F 47. Nailbiting 
attention span M, F 48. Nausea, feels sick 
. Claustrophobia M, F 49. Negativistie, stubborn, sullen, irrita- 
. Complains no one loves him, feels ble 
rejected M, F 50. Nervous, high strung 
. Compulsions M, F 51. Nightmares 
Confused M, F 52. Obese 
. Constipation M, F 53. Obsessions 
. Cruelty, bullying, meanness F 54. Overeating 
. Crying M, F 55. Overtired, fatigued, drowsy 
. Daydreaming, excessive fantasy M, F 56. Pains, physieal complaints 
. Depression, unhappiness, sadness 57. Peeping, voyeurism 
. Destruetive M, F 58. Phobias, fears 
. Diarrhea F 59. Picking 
- Diploplia, blurred vision, tubular M,F 60. Poor motor coordination 
vision, microscopia M, F 61. Poor school work 
. Disobedient, rebellious, discipline M, F 62. Quarrelsome 
problem M, F 63. Refusing to eat, not eating well 
. Dizziness M, F 64. Refusing to talk, mute 
. Encopresis, soiling M, F 65. Restless, hyperactive 
. Enuresis, wetting 66. Ritualistie behavior 
. Excessive talking, chattering M, F 67. Running away 
. Fainting M, F 68. Seclusive 
. Fantastic thinking, delusions, halluei- M, F 69. Self-conscious 
nations M, F 70. Sexual delinquency, incest, homosexu- 
. Fearful, anxious ality 
. Fears own impulses M 71. Sexual perversions, exposing self 
. Feelings of worthlessness, inadequacy, M, F 72. Sexual preoccupation, precociousness 
inferiority M 73. Showing off 
. Fighting, assault, aggressive behavior M, F 74. Shy, timid, submissive 
. Fire-setting 75. Silliness 
. Glue-sniffing, addictions M, F 76. Skin eruptions 
. Grinding teeth M 77. Sleepwalking 
. Headaches 78. Smearing feces 
. Ideas of reference, feels persecuted, M, F 79. Stealing 
suspicious K è M, F 80. Stomachache, cramps, abdominal pain 
i Inadequate guilt feelings M, F 81. Stuttering, speech problem 
H Inappropriately indifferent, e.g., to M, F 82. Swearing 
payiga! complaints, la belle indiffer- M, F 83. Temper tantrums 
REH M, F 84. Threatening people 
EEIN M, F 85. Thumbsucking 
. Loudness M, F 86. Truaney — " 
. Lying, cheating M, F 87. Ties, trembling, shaking 
. Masochism, self-harm, suicidal, threat- M — 88. Vandalism 
ens to kill self M, F 89. Vomiting 
. Masturbation M, F 90. Withdrawn 
. Moodiness, rapid change of mood M, F 91. Worrying 
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j APPENDIX B 
| TABLE B1 
Socran Crass INDEX 1 
Males Females 
| Social class |Intemalizing| U^glasi- | Externaliz- | Totals | Internalizing Uaglasi- | Extermaliz- | Totals 
| kx| ~@ |s|ve|w|w|N|9*|N| % |N ^N % ,|N|* 
| 1.0 2| 3.0| 6|5.8| 14 | 10.9 | 22 | 7.4 | 14 | 10.3 8 | 8.7 12 | 19.4 | 34 | 11.7 
| 1.5 o| 0.0] 2/1.9| 3| 2.3] 5/1.7] 2 1.5|4|4.3|0| 00| 6| 2.1 
2.0 9 | 13.4 | 11 110.6 | 13 | 10.2 | 33 11.0 | 16 11.8 |19 |20.7 |11 | 17.7 | 46 | 15.9 
2.5 1| 15| 2/1.9] 2|] 1-6] 5|1.2| 5 38.7|2|2.2|3|.4.8|10| 3.4 
3.0 16 | 23.9 | 24 |23.1 | 28 | 21.9 | 68 22.7 | 37 27.2 |20 21.7 | 7 | 11.8 | 64 | 22.1 
| 3.5 2| 3.0| 4|3.8| 6| 4.7|12|4.0| 4 2.9|4|4.3|3| 4.8|11| 3.8 
4.0 16 | 23.9 | 23 |22.1| 28 | 21.9 | 67 22.4 | 34 25.0 |21 |22.8 |17 | 27.4 | 72 | 24.8 
| 4.5 0| 0.0| 4|3.8| 8| 6.3| 12| 4.0 3| 2:2| 4| 4.3| 0] 0.0) 7|:2.4 
| 5.0 10 | 14.9 | 16 |15.4 | 13 | 10.2 | 39 13.0 10| 7.4|4|4.3|5| 8.1|19 | 6.6 
| 5.5 1| 15| 4|3.8| 1] 0.8] 6} 2.0 0| 0.0/2)2.2/0] 0.0] 2, 0.7 
| 6.0 10 | 14.9 | 8| 7.7] 12) 9.4] 30 10.0| 11] 8.1|4|4.3]4| 6.5] 19) 6.6 
| "Totals 67 |100.0 |104 |99.9 |128 |100.2 |299 99.9 |136 |100.1 |92 |99.8 |62 |100.0 |290 100.1 
No data 1| 15| 0/|0.0| 0| 0.0| 1 03| 7| 49/2|2.1|1| 1.6] 10) 3.3 


Note.— The occupation and education scores of the head of the family were averaged to give the 
social class score. If only occupation or only education was known, it was used to obtain the social 


class score. 


Occupation scale Education scale 
1 = unskilled 1 = less than eighth grade 
2 = semiskilled 2 = completed eighth or ninth grade 
3 = skilled, domestic service, building service (mainte- 3 = completed tenth or eleventh grade 
nance) 
4 = clerical, minor sales, technical, personal and protective 4 = graduated from high school 
service, owners of farms and little businesses 
5 = owners and managers of small businesses, semipro- 5 = 1 year or more of college, business 


fessionals, major salesmen, administrative personnel college, art school, etc. 


of large firms b 
6 — professional and managerial, owners of medium and 6 


large businesses 


completed 4 years of college or more 


TABLE B2 
AGE 
Males Females 

Age fatermatizing| Undlami-. | Extemsüa-.| Totala — |Internalizisg ness. strenue MODE 
Wi oe. SS acl fee a ar EE TD TEN RTI Es REA cad % |u| % 
4 PESI 3:1222:9 1877. ESI TU 53-7, |... 0.7]4|4.3/ 4| 63| 9| 3.0 
5 $ 2105: 3" EOD So Faaa L 0.7|15|5.3]2| 8.2| 8| 2.7 
6 3| 4.4| 7| 06:3 jee 5.5 | 17 | 5.7| 8| 5.6 3|3.2|3| 4.8 |14| 4.7 
7 led S ules AR etes: bod 70|17| 5.7| 9| 6.3 7|7.4|2| 3.2 18 | 6.0 
8 4| 5.9| 14 | 13.5 | 17 13.3|35 | 11.7 | 11) 7.7 6|6.4|1| 1.6] 18 6.0 
9 5| 7.4|14| 13.5 | 11 8.6 | 30 | 10.0 | 10 | 7.0 14 |14.9 | 6 | 9.5 | 30 | 10.0 
10 8 | 11.8 | 10 | 9.6 | 15 11.7 | 33 | 11.0 | 17 | 11.9 11 111.7 | 2| 3.2 | 30 | 10.0 
11 EILE | 7 | 634159 7.0 |24| 8.0 |13 9.1|7|7.4|3| 4.8 | 23 Tel 
12 12 | 17.6 | 11 | 10.6 | 9| 7.0 32 | 10.7 | 12 | 8.4] 5 5.3 |11 | 17.5 | 28 | 9.3 
13 9| 13.2 | 16 | 15.4 | 16 12.5 | 41 | 13.7 | 18 | 12.6 7|7.4|5| 7.9 | 30 | 10.0 
14 6| 8.8| 9| 8.7|16|12.5 31 | 10.3 | 18 | 12.6 |12 12.8 |13 | 20.6 | 43 | 14.3 
15 10/147] 5| 4.8] 7 5.6 | 22| 7.3|25 | 17.5 13 13.8 |11 | 17.5 | 49 | 16.3 
Totals 68 |100.1 |104 |100.1 |128 |100.0 300 100.1 |143 |100.1 |94 |99.9 63 |100.1 |300 1100.1 
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TABLE B3 
DraGNosis 
Males Females 
Diagnosis Internal- | Unclassi- External- Totals Taternal- Unclassi- | External- | roi 
N| 97 N| 9, |N| %IN| % |N| HIN) 968 |N| %|N| % 
1. Adjustment reaction of 
adolescence 10 | 14.7| 18| 17.3| 23| 18.0) 51| 17.0| 18| 12.6/16 | 17.015 | 23.8) 49) 16.3 
2. Adjustment reaction of 
childhood 9 | 13.2, 16| 15.4 16| 12.5} 41| 13.7| 13| 9.1111 | 11.7| 6 | 9.5| 30) 10.0 
3. Adjustment reaction 
with habit disturbance | 1| 1.5| 0| 0.0| 4| 3.1) 5| 1.7| 2| 1.44, 4.3|]1| 1.0 7| 2.3 
4. Adjustment reaction 
with conduct disturb- 
ance 5| 7.4 8| 7.7) 24| 18.8) 37) 12.3| 0| 0.016] 6.4, 5| 7.9 | 11) 3.7 
5. Adjustment reaction 
with neurotic traits 5| 7.4 6| 5.8 3| 2.3) 14) 4.7) 11) 7.7 5] 5.3) 1| 1.6) 17) 5.7 
6. Dissociative reaction, 
anxiety reaction 1| 1.5 4| 3.8) 3) 2.3) 8| 2.7| 12] 8.4)1] 1.1) 1] 1.6) 14) 4.7 
7. Conversion reaction 2| 2.9 2| 1.9 0) 0.0) 4| 1.3) 10) 7.002 | 2.1) 1| 1.6) 13) 4.3 
8. Depressive reaction 2| 2.9 1| 1.0, 0| 0.0) 3| 1.0) 6 4.22] 2.110; 0.0 8| 2.7 
9. Obsessive-compulsive 
reaction 4| 5.9 1| 1.0) 1| 0.8} 6| 2.0) 2 1.4/0) 0.0/0] 0.0.2 0.7 
10. Phobie reaction 2| 2.9 0| 0.0) 0| 0.0} 2, 0.7 3| 2.1/0] 0.0/0, 0.0) 3| 1.0 
11. Psychophysiological re- 
tion 1| 1.5 5| 4.8 O| 0.0) 6| 2.0) 8| 5.6/0] 0.0/0} 0.0) 8| 2.7 
12. Schizophrenic reaction, 
all types 4| 5.9 2| 1.9 2| 1.6) 8| 2.7) 2| 1.410 | 10.6) O | 0.0] 12) 4.0 
13. Emotionally unstable 
personality, inadequate 
personality 1| 1.5) 1| 1.0 1| 0.8) 3| 1.0 0| 0.0/2| 2.1/2] 3.2) 4| 1.3 
14. Passive-aggressive per- 
sonality 0| 0.0 3| 2.9 1| 0.8) 4| 1.3 1| 0.70| 0.0) 2] 3.2 3 1.0 
15. Personality trait dis- 
turbance 0| 0.0 1| 1.0 3| 2.3) 4| 1.3 0) 0.01, 1.1/0] 0.0) 1| 0.3 
16. Schizoid personality 8| 4.4 1| 1.0) 0| 0.0) 4| 1.3 2| 1.4 1| 1.1) 0] 0.0) 3| 1.0 
17. Sociopathic personal- 
ity, psychopathic per- 
sonality 0| 0.0 0| 0.0) 1| 0.8) 1| 0.3] 0) 0.0) 0] 0.0) 2] 3.2 2| 0.7 
No diagnosis 18 | 26.5) 35) 33.7) 46) 35.9) 99) 33.0) 53| 37.1/33 | 35.1/27 | 42.9)113) 37.7 
Totals 68 |100.1/104/100.2/128/100.0/300/100.0/143|100. 1/94 |100.0/63 |100.1/300|100.1 
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THE INFLUENCE OF BELIEF SYSTEMS ON 
INTERPERSONAL PREFERENCE: 


A VALIDATION STUDY OF ROKEACH’S THEORY OF PREJUDICE! 


DAVID D. STEIN* 


Universit: ^f California, Berkeley 
mae, 
A full-scale test of Tí s theory of belief prejudice with 630 9th- 
grade students stro: + orts the validity of the theory. When infor- 
mation about a stimulus person's beliefs in the area of personal values is 
made available, similarity or dissimilarity in beliefs is the primary deter- 
minant of attitudes of white gentiles toward Negroes and Jews. These 
results also hold for Negro and Jewish students’ attitudes towards mem- 
bers of the majority. Only secondarily does racial or religious affiliation per 
se, or high versus low relative socioeconomic status, influence the students’ 
feelings (friendliness measure) and action orientations (social distance scale) 
toward others. In response to individual social distance items, gentile Ss 
showed relative unwillingness to interact with Negroes as compared with 
whites in “sensitive” areas of interracial contact. Similar results, but to a much 
lesser degree, were obtained for anticipated interaction with Jewish stimulus 
persons. Gentile Ss’ responses on another occasion to an otherwise unde- 
scribed “Negro teenager” correlated substantially with their responses 
to a lower status Negro to whom values unlike their own were ascribed. 
Other data indicate strong race and religion effects and a weaker status 
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effect in the absence of information about stimulus persons’ beliefs. 


(€ of the many ideas presented in 
The Open and Closed Mind (Rok- 
each, Smith, & Evans, 1960) is that preju- 
dice may be in large part the result of per- 
ceived dissimilarity of belief systems. In 
essence, Rokeach et al. (1960) contended 
that the prejudiced person does not reject a 
person of another race, religion, or nation- 
ality because of his ethnie membership per 
se, but rather because he perceives that the 
other differs from him in important beliefs 
and values. This specific hypothesis was de- 
veloped out of Rokeach's general theoreti- 
cal framework, in which the emphasis is 
on cognitive determinants of social behavior 


1 This paper is based on a doctoral dissertation 
submitted to the Graduate Division and Psychol- 
ogy Department of the University of California, 
Berkeley, June 1965. This research was supported 
by Grant MH 10610-01 from the National Insti- 
tutes of Health, United States Public Health 
Service to M. Brewster Smith, principal investi- 
gator. The author wishes to express his gratitude 
to Professor Smith and to Jane Allyn Hardyck 
for their helpful suggestions and critical com- 
ments in the preparation of this paper. 

. Now at the Department of Psychiatry, Albert 

ein College of Medicine, Yeshiva University. 
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and belief systems are given focal atten- 
tion. 

Rokeach et al. report two studies in 
which subjects were asked to rate pairs of 
stimulus individuals on a 9-point scale, de- 
fined at the ends by the statements, “I can't 
see myself being friends with such a per- 
son" and “I can very easily see myself be- 
ing friends with such a person.” In one ex- 
periment, the stimulus individuals were 
white or Negro; in the other they were 
Jewish or gentile. Reported beliefs of the 
stimulus individual concerning racial, reli- 
gious, and other matters were also varied. 
It was found that the friendship preferences 
expressed were determined primarily on the 
basis of congruence in beliefs rather than 
on racial or religious grounds. 

The presentation of the theory of belief 
prejudice, based on these preliminary ex- 
periments, has led to a number of studies 
that both lend support to and qualify the 
basic tenets of the theory (Byrne & Wong, 
1962; Rokeach & Mezei, 1966; Stein, Har- 
dyck, & Smith, 1965; Triandis, 1961; Tri- 
andis & Davis, 1965). These studies have 
developed along the following lines: 


1. Triandis (1961) objected to the use by 
Rokeach et al. of a single dependent vari- 
able of friendship and instead employed a 
social distance scale. Varying race, religion, 
occupational status, and similarity of phi- 
losophy of life to that of the subject, 
Triandis obtained a “race effect" that ac- 
counted for about four times as much vari- 
ance as any of the other three effects singly. 
Triandis concluded that race, rather than 
belief, is the primary determinant of preju- 
dice. 

2. Rokeach (1961) criticized Triandis’ 
manipulation of similarity of philosophy 
via Morris’ (1956) *13 ways to live," as 
based on complex and diffuse paragraphs 
that would not make similarity or differ- 
ence of beliefs salient to the subjects. 

3. Byrne and Wong (1962) essentially 
supported Rokeach's position, employing 
alleged responses to an attitude question- 
naire as the basis for manipulating simi- 
larity of belief, and personal feelings of 
friendliness and willingness to work to- 
gether in an experiment as dependent vari- 
ables. 

4. Stein et al. (1965), attempting to rec- 
oncile the disparate findings, followed 
Byrne and Wong in constructing stimulus 
individuals who were intended to appear 
more real to their subjects than had been 
the case in the Rokeach et al. and Triandis 
studies. Their modification also required 
subjects to respond to stimulus persons in- 
dividually rather than in pairs in order to 
minimize any awareness that the choice was 
between response in terms of race or of be- 
lief. As dependent variables, Stein et al. 
employed both a measure of friendly feel- 
ings and a social distance scale, on which 
responses to each of the individual items as 
well as to the total scale could be sepa- 
rately analyzed. Their findings, which pro- 
vide the starting point for the present 
study, included the following: 

First, in the analysis of "friendliness" 
responses and total social distance scale 
scores, belief accounted for much more var- 
iance than race, although both effects were 
significant. Secondly, strong “race effects” 
were obtained on “sensitive” items in the 
social distance scale, perhaps reflecting in- 
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stitutionalized areas of prejudice. There 
were significant race effects, and, to a lesser 
degree, status effects on the total scores on 
a social distance scale administered to the 
same subjects on a previous occasion when 
race and status had been varied and no in- 
formation about beliefs provided. Thirdly, 
a correlational analysis showed that sub- 
jects responded to a Negro stimulus person 
pres unlike them in values in much 
the s Way as they had previously re- 
sponded to an otherwise unspecified Negro 
about whom they had no other information 
(r — .62). This correlation was interpreted 
to mean that, in the absence of other infor- 
mation, subjects assume that Negroes are 
unlike them in values. Stein et al. (1965) 
concluded: 


When subjects are forced to evaluate stimulus 
individuals in terms of their beliefs, then belief 
congruence is more important than race. But 
when the belief component is not provided, spelled 
out in considerable detail, subjects will react in 
racial terms on the basis of assumptions concerning 
the belief systems of others, and of emotional or 
institutionalized factors [p. 289]. 


5. Rokeach and Mezei (1966) were in- 
terested in seeing if the theory of belief 
prejudice could be generalized beyond the 
pencil-and-paper test situations to behavior 
in representative real life settings. In three 
interrelated experiments, a naive subject 
was asked either to state a preference for 
two of four confederates to take a coffee 
break, or to choose among fellow “job ap- 
plicants” the two with whom he would 
most like to work. Two of the four con- 
federates were Negro and two were white. 
One of each race agreed and one disagreed 
with the subject. The authors conclude that 
in all three experiments, similarity of be- 
lief is the most frequent basis of subjects’ 
choices. 

6. Recently, Triandis and Davis (1965) 
reported a study in which 300 subjects re- 
sponded on 12 semantic and 15 Behavioral 
Differential scales to eight stimulus persons 
generated by all possible combinations of 
the characteristics Negro-white, male-fe- 
male, and pro or con civil rights legislation. 
Some subjects proved to be extremely sen- 
sitive to the race of the stimulus persons 
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while other subjects showed a greater sen- 
sitivity to the beliefs of the stimulus per- 
sons. The likelihood of a person's showing 
sensitivity to race as compared to belief is 
related to the degree of intimacy of the be- 
havioral situations described. For almost 
all subjects, the more intimate the behav- 
iors, the more frequently did subjects re- 
spond in terms of race. For the least inti- 
mate behaviors, most subjects responded in 
terms of belief. When the behaviors were 
intermediate in intimacy, subjects charac- 
terized independently as “racially preju- 
diced” responded in terms of race, and sub- 
jects characterized independently as “belief 
prejudiced” responded in terms of belief. 
These findings are generally consistent with 
those of Stein et al., especially with regard 
to the importance of race in determining 
responses to intimate items on a social dis- 
tance scale. Although Stein et al. did not 
have an independent measure of their sub- 
jects’ prejudicial orientation, that is, belief 
or race, they found, contrary to Triandis 
and Davis, that belief was equally impor- 
tant throughout the social distance scale. 


RATIONALE AND AIMS 


The present research was undertaken to 
replicate the original study by Stein et al. 
(1965) with a more adequate sample, and 
to elaborate upon it in a variety of ways. 
In connection with another study? ques- 
tionnaires had been administered in the 
Spring of 1963 to the entire eighth grade of 
the Commutertown* publie school system. 
Crucial for the present study was the in- 
clusion of a series of items tapping the re- 
spondents' beliefs in the area of personal 
values. The data to be reported here were 
collected from the same students in the late 


TA: large-scale study of adolescent intergroup 
relations and attitudes being conducted by Jane 
Allyn Hardyck and M. Brewster Smith through 
the Survey Research Center, University of Cali- 
fornia, with the support of the Anti-Defamation 
League of B'nai B'rith provided the opportunity 
for the present investigation. 

* Commutertown is the fictitious name given to 
a Northeast suburban city. I am indebted to the 
superintendent and staff of the Commutertown 
schools, who must remain anonymous, for their 
cooperation, and to Oscar Cohen of the Anti-Def- 
amation League for his part in securing it. 


Spring of 1964 when they were ninth grad- 
ers. 

Specifically, the present study replicates 
all of the analyses of white gentile students’ 
attitudes toward Negro stimulus teenagers. 
By way of extension, it assesses subjects’ 
responses to stimulus teenagers composed 
so as to vary systematically not only race 
(white versus Negro) and similarity of be- 
lief, but also social status. Further, the 
generality of findings is extended by having 
half the sample respond to stimulus adults 
rather than to stimulus teenagers, In some 
analyses, the religion rather than race of 
the stimulus person is varied with belief 
and status. The larger and more heteroge- 
neous sample in the present study permits 
sex differences to be examined and sepa- 
rate analyses made for Jewish, Negro, 
white Protestant, and white Catholic sub- 
jects, the former two groups being unrepre- 
sented in the earlier study. 

The inclusion of religious affiliation as a 
variable has interesting implications in 
terms of Rokeach’s theory. Knowledge of a 
person’s religion yields information about 
central features of his probable beliefs. 
Thus, when both religion and belief (exem- 
plified in personal values) are varied in the 
presentation of stimulus persons, strong 
elements of the belief component are em- 
bedded in the meanings attached to the re- 
ligion ascribed. If Rokeach is right, aserip- 
tion of religion rather than race might be 
expected to have a large effect, in compari- 
son with similarity of belief. In addition, 
religious membership should be particu- 
larly salient for Jewish subjects, and race 
for Negro subjects, because of the emphasis 
on these factors in their upbringing. To pit 
religion and race, respectively, against sim- 
ilarity of belief, as determinants of these 
subjects’ responses to stimulus persons, is 
thus to test Rokeach's theory of belief 
prejudice under quite stringent conditions. 


METHOD 


Preparation of Questionnaires 


Each subject received a personally tailored 
questionnaire built around supposed excerpts from 
the replies of four teenagers or adults who had 
allegedly filled out the same research questionnaire 
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that the subject himself had completed in the 
eighth grade. Whether the stimulus persons whom 
any one subject received were teenagers or adults 
depended upon the form of the questionnaire that 
the subjeet had previously filled out. This origi- 
nal questionnaire contained either questions re- 
garding the subject’s feelings about teenagers or 
parallel questions about adults, and subjects at 
that time had been randomly assigned to receive 
one form or the other. 

The instructions which appeared on the first 
page of the questionnaire were as follows: (Form 
A) 

As you will probably remember, about a year 
ago we asked you to answer some questions con- 
cerning your interests and attitudes about your- 
self, your friends, and certain groups of people. 
You may also recall that there were some ques- 
tions asking you to give first impressions about 
people when you knew only a few things about 
them, such as the person’s religion or type of 
job. We are very much interested in how people 
form these impressions. 

In fact, we would like to know how you 
would feel about some adults who took a simi- 
lar questionnaire to the one you answered, but 
in other parts of the country. Therefore, we have 
taken some of their answers and presented them 
on the following pages. 

We want you to look at the descriptions of 
[our adults and answer some questions about 
how you feel toward each of them. It is impor- 
tant that you go through this booklet in order. 
Do not skip ahead, and once you have finished 
answering questions about a person, do not go 
back. 

If you have any questions, please raise your 
hand and your teacher will help you. Be sure 
to read everything carefully. And remember, feel 
free to answer the questions exactly the way you 
feel, for no one but the research workers at the 
University of California will see your answers. 
The instructions and basic format of the ques- 

tionnaire follow the plan of the questionnaire used 
by Stein et al. (1965), with only minor modifica- 
tions. Stimulus persons were in each case the same 
sex as the subject. The presentation of each stimu- 
lus person contained information about belief, race 
or religion, and status. 

The value items for presenting the belief sys- 
tem of a stimulus adult were as follows: 

Do you think people in general ought to... 

1. Be loyal to the U.S. more than to any other 
group or cause. 

2. Be interested in doing things in their com- 
munity ; be useful citizens. 

3. Be unconcerned with making a great deal of 

money. 
4. Be intelligent and well informed, be able to 
think clearly about things. 
. Keep their property in good condition; not 
let things get run down. 
. Have good taste in clothes. 


oO m 


T. 


17. 


18. 
19. 
20. 


Be concerned about other people; not be 
self-centered. 

Be modest, not try to draw attention to 
themselves. 


. Support movements or groups that are work- 


ing for equal rights for everyone. 
Be sincerely religious. 


. Have respect for other people's wishes and 


beliefs; not be bossy. 


. Let everyone have his fair share in running 


business and politics in this country. 


. Be honest and trustworthy. 
. Be generally friendly and sociable; mix with 


different kinds of people. 


. Treat other people as equals; not be con- 


ceited or snobbish. 


. Follow all the rules and laws that have been 


made by those in authority. 

Stay in groups or neighborhoods where they 
are welcome ; not be “social climbers.” 

Live up to strict moral standards. 

Be hard working, not lazy. 

Go along with what most others do and 
stand for; not be too different. 


These items were followed by five columns of re- 
sponse alternatives headed by “Strongly feel they 
should” to “Strongly feel they shouldn’t” with 
“Don’t care” as the middle point. The experi- 
menter circled the appropriate alternative for each 


item, 


as designated by the computer program (see 


p. 5), to give the subject the impression of the 
stimulus person’s responses to these items. 

The information, other than values, provided to 
describe a stimulus adult was as follows: (Form 
for Religion) 

1 feriis 3 ut 


2. Ag 


eed 
3. What is your job called? 


4. How much education did you have? 
— — —some grade school 
— — — finished the 8th grade 
— — some high school 
— — graduated from high school 
—  — some college 
— — — graduated from college 
— —— some further education after college 


. What is your religion? 
— —— Protestant 


Catholic 


— Jewish other .—— none 

A copy of the complete Form T questionnaire 
appears in Stein (1965). The factors and their cor- 
responding levels are given as follows: 

Race: White versus Negro 

Religion: Protestant or Catholie versus Jewish 

Belief: Similar values versus dissimilar values 

Status: High versus low 

The four stimulus persons, who varied in terms 
of the factors of race and religion, were assigned to 
each subject following the plan presented in Table 
1. Each pair of stimulus persons indicated in the 


table 


was composed of one high in status and one 


low in status. 


Beuer Systems AND PREJUDICE 5 


Description of stimulus persons: Belief factor. 
Each subject had filled out a 20-item value scale 
concerning “how people ought to be” (Form A) 
or “how fellow students ought to be” (Form T), 
as part of the Interest and Attitude questionnaire 
given when he was in the eighth grade. (For the 
adult form, see above; for the teenage form, see 
Stein et al. [1965, p. 283], with the omission of 
Items 1, 3, 17, 21, and 22 because of small response 
variance.) An IBM 7090 computer program was 
written so that each subject’s original responses to 
these items could be presented systematically in 
such a way as to make two sets that were similar 
and two that were dissimilar to the subject’s origi- 
nal responses. The basie procedure for this pro- 
gram appears in Stein (1965, pp. 94-97; for com- 
plete details write to the author). Thus, beliefs 
were ascribed to the four stimulus persons pre- 
sented to any given subject on the basis of the 
subject’s own responses to these same items. One 
of the four stimulus persons was always described 
with exactly the same responses that the subject 
originally gave to the items. In order to avoid 
raising the subject’s suspicions, the other like- 
valued stimulus person was made to differ slightly 
from the first by changing a few responses one 
step on the 5-point scale. The two unlike-valued 
stimulus persons were prepared by making more 
radical changes, again using the subject’s own re- 
sponses as the reference point. In addition, the 
program randomly varied the order of the two 
like-valued patterns and the two unlike-valued 
patterns, 

„Description of stimulus persons: Race or reli- 
gion and status factors. The information about 
race or religion and status was also presented by 
checks in the appropriate spaces, as if representing 
questionnaire responses. Each stimulus person was 
described in terms of either race or religion. 

For those subjects who responded to the adult 
form of the questionnaire, occupation and educa- 
tion were used as indexes of status. A “doctor” or 
lawyer, randomly interchanged, was combined 


TABLE 1 
ASSIGNMENT OF STIMULUS PERSONS 
TO SUBJECTS 


Stimulus persons 


Jewish Neg isnt “otic White 


Membershij TT 
of subject? 


Jewish 2 a 2 — — 
Negro ux. 2 Nee DE 
Half of the white 

Protestants 2 — 2 = — 
Half of the white 

Protestants — 2 = — 2 
Half of the white 

Catholics 2 pes = 
Half of the white 

Catholics = 2 — — 2 


no 


with “some further education after college” in the 
descriptions of high-status males. For low-status 
males, a “factory worker” and “truck driver” were 
randomly interchanged, and the amount of edu- 
cation attributed to them was “some grade school.” 
A high-status female was presented as either an 
“executive secretary” or a “dress designer” with 
“some further education after college.” A low- 
status female was depicted as a “factory worker” 
or a “waitress” with an education of “some grade 
school.” Each adult stimulus person was described 
as either 34 or 36 years old. 

All teenage stimulus persons were described as 
in the ninth grade (same grade as the subjects). 
Status was indicated by program in school and 
last year's grade average: “college preparatory” 
and “getting about a ‘B’ average” for high status, 
and "vocational" and “getting below a ‘D’ aver- 
age" for low status. 


Experimental Design 


Given the variables with which we are con- 
cerned, eight possible stimulus persons could be 
constructed. Excessive time demands (as indi- 
cated by a pilot study) made it impractical for 
each subject to respond to all eight combinations. 
Therefore, a 2 X 2 X 2 factorial design in blocks 
of Size 4 (repeated measures) was employed 
(Winer, 1962, pp. 409-412). According to this de- 
sign, the comparisons involving race, for example, 
are as follows: 

One-half of the subjects received the following 
four stimulus persons: 

Group I 

White, unlike values, lower status 

Negro, like values, lower status 

Negro, unlike values, upper status 

White, like values, upper status 

One half of the subjects received the following 
four stimulus persons: 

Group II 

Negro, unlike values, lower status 

White, like values, lower status 

White, unlike values, upper status 

Negro, like values, upper status 

In the comparisons involving religion, Jewish 
and Protestant or Catholic were substituted for 
Negro and white, respectively, (See Table 1; note 
in the religion comparisons that "same versus dif- 
ferent” religion is the basis for ascribing religion 
to the stimulus persons. Thus, Catholic subjects 
responded to Catholic and Jewish stimulus per- 
sons and Protestant subjects responded to Protes- 
tant and Jewish stimulus persons.) 

Within each subsample (the 24 cells in Table 
2), each subject was randomly assigned to Group 
I or II above and the order of presentation of 
stimulus persons within both Groups I and II was 
randomly varied, with the restriction that no two 
“like-valued” or “unlike-valued” stimulus persons 
ever appeared consecutively in a questionnaire. 

Dependent variables. After the description of 


TABLE 2 
Saweus (N = 630) 
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each stimulus person, the following sets of ques- 


os 
E] Be tions were asked: 1 , 
gja 8 43 1. Friendliness. The first question, which meas- 
B&B HE ured the subject's overall affective reaction to the 
M stimulus person and which might be considered a 
x measure of the affective component of a prejudiced 
& 8 or unprejudiced attitude, appeared as follows: 
i 2.8 £ E If you met this person for the first time, what 
man) zw would your immediate reaction be? 
$ I think I would feel: 
a E: quite friendly 
ő j 3 
Saal ements E 3 iced little friendly 
Gn chu. 4 A — — nothing either way 
$ — — 4A little unfriendly 
quite unfriendly 
E-] 2. Social distance. Next, subjects responded to 
|a a ae a 10-item social distance scale. The items on the 
2 PN PES scale are appropriately designed for the teenage 
z form as well as the adult form. (See Stein et al., 
1965, p. 287, for the complete wording of Form T 
F items or Table 11 hereafter for a paraphrased ver- 
Dro dd sion.) 
é $S The items on the adult form were as follows: 
$ Do you think you would be willing to .. . 
Yes No 
i 3 — — —— have this person as a neighbor on your 
2 ° 2 street 
a : on " 
A UA res $ È —— — work on a charity fund-raising drive 
$8 with this person 
2 | 8 e —— — have this person as one of your speak- 
@)El es ge ing acquantances 
3 a 8 4t —— —— go toa party to which this person was 
HE invited 
I —— have this person as a member of your 
Y social group or club 
i nas E] 2 —— — live in the same apartment house as 
3 £z this person 
z » —— have this person as a close personal 
friend 
“y ——— ~— invite this person to your home for 
i ae 3. dinner 
E T EE! —— —— have a close relative marry this person 
z —— —— share an apartment with this person 
£ 3. Check on the manipulation. Thirdly, a ques- 
z 2 tion was asked regarding how much like the stim- 
ic. 4 8 3 8 ulus person the subject saw himself. This question 
2 & $ appears as follows: 
z How much like you would you say this per- 
son is? 
ra —as much like me as any person I can 
Paso think of 
é $8 very much like me 
2 5 — — A little like me 
Š — a little unlike me 
P 3 very much unlike me 
zie zg a E —-— as much unlike me as any person I can 
< HE think of 
5 Since the stimulus persons were designed to be 
; “like” or “unlike” the subject, it was necessary to 
H Hi find out if subjects so perceived them. 
B A 3 Sample 
É 2 3 E The final sample consisted of 630 ninth-grade 
a 


students. Table 2 classifies the subjects by ethnic 
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(racial or religious) affiliation, sex, and form of 
the questionnaire, and by the particular stimulus 
persons who appeared in the subjects’ question- 
naires. In order to obtain sufficiently large cell en- 
tries, subjects were combined across the two junior 
high schools and Protestants and Catholics were 
combined (as “gentiles”) for most of the analyses. 


Administration of Questionnaires 

The questionnaires were administered during 
the subjects’ regular class in social studies. The 
investigator met with the social studies faculty of 
each school the day before the administration to 
go over the instructions and to answer any ques- 
tions, Seven social studies teachers at one of the 
junior high schools and nine at the other admin- 
istered the procedure. 

Each subject received his questionnaire in a 
sealed envelope. His name appeared on the en- 
velope but his questionnaire was identified only 
by a code number. The teachers were told to 
throw away the envelopes as soon as the students 
had removed their questionnaires. Subjects were 
told that the code number was necessary for sta- 
tistical analyses, that no student would be con- 
sidered individually or by name, and that only 
the research workers at the University of Cali- 
fornia would see their answers. 


RESULTS AND DISCUSSION 

The 2 x 2 x 2 factorial design used for 
the analysis of the check on the manipula- 
tion and for the two dependent variables of 
liking and social distance can demonstrate 
only whether or not a given independent 
variable has a significant effect on the de- 
pendent variable. In order to determine 
Whether one treatment effect is signifi- 
cantly greater than another, it is necessary 
to calculate the proportion of total variance 
contributed by each treatment effect and 
then to test for any significant difference 
between effects. 

The index, 92, expresses the strength of 
association between independent and de- 
pendent variables in terms of the propor- 
tion of total variance accounted for by the 
treatment effect (Hays, 1963). The 0? val- 
ues were computed for each sample and for 
each treatment effect in the present study. 
The 0? values for any two selected factors 
were then ranked across samples in order 
of magnitude and White's rank test (Ed- 
wards, 1954) was applied to determine 
whether one factor contributed a signifi- 
cantly greater proportion of the variance 
than the other. Two-tailed tests were used 
in all cases. 


T 


Throughout the discussion, trends in the 
differential effect of treatments on related 
subsamples will be pointed out. Quite often, 
though, significance tests could not be 
meaningfully applied to these data since 
they were based on so few samples. 


Check on the Manipulation 


The subjects’ responses to the question: 
“How much like you would you say this 
person is?" served as a check as to whether 
the stimulus persons appeared like or un- 
like the subjects as intended. A summary 
of the 16 analyses of variance for responses 
to this question appears in Table 3. The 
main effect, of belief accounts for almost all 
of the variance, contributing significantly 
more variance than either the race, the re- 
ligion, or the status effect (p « .01 for all 
rank-order comparisons). Subjects tended 
to see themselves as similar or dissimilar to 
the stimulus persons mainly in terms of be- 
lief. Means and standard deviations for 
many of these analyses are omitted here to 
conserve space, but are presented in Stein 
(1965). In this analysis, like-valued per- 
sons are perceived as more like the subject 
than unlike-valued ones. 

In the analysis involving the religion 
variable, the small proportion of variance 
not accounted for by the belief effect is di- 
vided about equally between status and 
religion. The fact that status effects ac- 
count for more variance on Form T than on 
Form A (p < .05 for the rank-order differ- 
ence) suggests that the status attributes of 
the stimulus teenagers are more salient to 
the subjeets than the corresponding status 
attributes of the adult stimulus persons. 
The two Form A samples for which the 
status effect, was significant are the gentile 
and Jewish males. This finding seems rea- 
sonable since the manipulation of the adult 
stimulus persons' status may well have been 
more powerful for males than for females. 
That is, the difference between “doctor” 
and “lawyer” on the one hand and between 
“factory worker” and “truck driver” on the 
other appears to be greater than that be- 
tween “dress designer” and “executive sec- 
retary” as opposed to “factory worker" and 
“waitress,” 

The fact that three of the four Jewish 


8 Davm D. Srem 


samples show significant status effects 
seems in line with the probable stress on 
this factor in Jewish families. It is com- 
mon knowledge that middle-class Jewish 
parents tend to hope that their children 
will get good grades, go to college, and en- 
ter into professions. The high-status stimu- 
lus persons are presented as successful in 
fulfilling such expectations. 

The results concerning the religion vari- 
able have some interesting implications. In 
four of the eight relevant samples there was 
a significant religion effect; whereas in no 
sample was there a significant race effect 
(p « .05 for the rank-order difference be- 
tween Q? values for religion and race). This 
fact is in accord with the expectation that 
there is a meaningful belief component in 
the ascription of religion. Knowing merely 
that a person is Protestant, Catholic, or 
Jewish may imply much about the person’s 


beliefs. Thus it is not surprising that judg- 
ments of similarity are more frequently 
based at least in part on the religion of the 
subject than on his race, which implies 
much less about belief systems. 


Friendliness 


Table 4 shows the summary of the analy- 
ses of variance computed on the responses 
of the 16 subsamples to the “friendliness” 
question, intended as a measure of “affect” 
toward the stimulus person. 

As may be seen in the column headed 
Belief, the belief component of the stimu- 
lus individuals accounted for almost all of 
the variance in responses to this question. 
The belief effect contributes significantly 
more variance than either the race, the re- 
ligion, or the status effects (p < .01 for all 
rank-order comparisons). In 15 of the 16 
samples there was a highly significant be- 


TABLE 3 
SUMMARY or THE 16 ANALYSES OF VARIANCE FOR RESPONSES TO THE QuESTION: How Muca Like You 
Woutp You Say Tuts Person Is? (As Mucs Like Me—As Mucu Unurke Me— 
6-PoiNT SCALE) 


Race lief 
Race Belief Status Race x dius y 
Sample Form N 
Prop. E Ep! 
ET E eto iate | od c 
ance ance ance 
Negro males A | 25 .26 .00 | 17.85****| .10 | 1.79 00 .01 .01 .01 
Negro females A|25 .98 .00 | 66.37****| .39 | .10 00 .58 .01 .01 
Gentile males A |30| 1.04 .00 | 15.26****| .08 | 4.15* .02| 4.88* | 2.88 .72 
Gentile females A|33 .28 .00 |135.19****| .47 | .03 -00 .03 1.12 .28 
Negro males T | 23 .56 .00 | 31.62****| .21 | 2.25 .01 .88 4.26 .88 
Negro females T|24 .12 .00 | 68.84****| .40 | 7.98*** | .04 | 1.12 .28| 1.12 
Gentile males T | 26 .12 .00 | 35.80****| .20 | 7.93*** | .04 .03 2.51 Bud 
Gentile females T | 26 .08 .00 | 64.08****| .34 | 7.12*** | .03 | 1.43 .68 | 1.43 
Religion Belief Status Religion X | pa x | Belief X 
Belief Status Status 
t 
Jewish males A | 69 |15.82****) .03 |148.83****| .28 |18.01****| .03 | 1.42 2.13 .44 
Jewish females A | 68 | 3.08 .00 |350.25****| .54 | 1.05 .00 .02 .08 | 1.37 
Gentile males A | 32 | 5.15* .02 | 48.59****| .24 | .24 .00 | 1.68 .42 .24 
Gentile females A |30| .17 .00 |121.88****| .41 | 2.13 -00 | 1.08 .04| 1.56 
Jewish males T |84|9.20*** | .01 |151.35****| .26 |70.75****| .12 | 0.0 .40 | 4.98* 
Jewish females T |88| .24 .00 |350.10****| .42 |64.02****| .08 -74 .06 | 10.24*** 
Gentile males T | 20 | 3.60 -03 .99 .00 | .81 .00 .81 0.0 +88 
Gentile females T | 27 | 9.73*** | .05 | 31.54****| .18 | 5.79* .03 .07 .64| 2.30 
*p = 05. 
"p = Ql. 
y — 001. 
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TABLE 4 
Summary OF THE 16 ANALYSES OF VARIANCE FOR RESPONSES TO THE QuEsTION: Ir You Mer Tuis 
PERSON FOR THE First Tıme, WHat Wounp Your Immeprare Reaction Be? (Quits FnIENDLY— 
Quite UNFRIENDLY—5-PorNT SCALE) 


s Race li 
LI ME E IRE 
Sample |Form| N 
E vari- P Pn F ve F F R 
ance ance ance 
Negro males A |25| 2.67 .01 | 21.96****| .14 | .30 .01 | .30 .96 ll 
Negro females A |25| 3.54 .02 | 55.34****| 34 | .04 .00| .81 .20 .65 
Gentile males A |30| 2.06 .00 | 20.50****| .10 | 5.38* .02 | 5.38* .01 M 
Gentile females A | 33 .50 .00 | 80.31****| .34 | .82 .00| .25 2.28 -50 
Negro males T | 23 .25 .00 | 23.75****| .17 | .01 .00 | 1.20 2.22 .80 
Negro females T|24 .01 .00 | 41.54****| .27 | .97 .00 | 2.02 .01 | 1.44 
Gentile males T | 26 48 .00 | 38.56****| .21 | 6.40* .03 | 0.0 -85 | 0.0 
Gentile females T | 26 18 .00 | 42.58****| .25 | 4.43* .02 | 1.60 .04 mA 
Religion Belief Status Religion X | goa x | Belief X 
Belief Status Status 
Jewish males A | 69 | 10.43***| .02 |116.41****| .25 | 5.78* 01 | 1.06 1.36 57 
Jewish females A | 68} 1.14 .00 |224.59****| .42 | 1.14 00| .37 .02 .09 
Gentile males A |32| 2.23 -O1 | 26.16****| ,17 | .15 00| .15 2.23 .50 
Gentile females A | 30 .22 .00 | 78.92****| .34 | .22 00| .87 1.97 .22 
Jewish males T|84| 2.66 .00 |113.21****| .22 |25.20****| .05 | 0.0 .96 .96 
Jewish females T | 8&8 02 .00 |229.85****| .36 |20.80****| .03 | .74 1.23 «74 
Gentile males T | 20 .80 .00 | 9.19****| .08 | 1.00 00 | 7.49*** | .62 .02 
Gentile females T | 27| 3.46 .01 | 44.56****| .24 | 5.28* 02 97 .59 | 4.32* 
*p-.0. 
+ p= 01. 
"5 = 001. 


lief effect (p < .001), and the belief effect 
in the other sample (gentile males, Form 
T) was significant at better than the .01 
level. (Note that this is the same sample 
for which the manipulation of similarity 
appeared to be ineffective.) 

The results for this sample in all analyses 
should be viewed with great caution. Of the 
16 subsamples in the study, this 1 has the 
smallest N (20). Besides, when this sample 
was divided so that approximately half of 
the subjects would receive four treatments 
and the other half of the subjects the other 
four treatments, the actual split came out 
to be 13 and 7 instead of the desired 10 and 
10 because some subjects were absent or 
had transferred to another school. The 
Winer (1962) model assumes an equal 
number of subjects in each group. If N is 
large and the difference between N, and Nz 
in each group is small, statistical assump- 


tions for the model are not seriously vio- 
lated, Departures from this rule, as in this 
case, reduce the power of any statistical 
test that might be applied to the data. The 
use of a General Linear Hypothesis Model 
(Biomedical Computer Programs, 1961) is 
recommended in such cases and was in fact 
carried out for this sample as well as for 
the Negro females, Form A (N = 25; Nı = 
15, Nə = 10). The F ratios reported in 
Tables 3, 4, and 9 for these samples were 
derived by this procedure. 

The race effect was not significant in any 
of the eight samples in which race was var- 
ied. The religion effect was significant in 
only one of the eight samples in which it 
was tested (Jewish males, Form A). The 
status effect was significant in 7 out of 16 
samples, with generally lower significance 
levels than those for belief. That is, five of 
these seven tests for status were significant 
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only at the .05 level while the other two 
reached the .001 level. Moreover, rank- 
order tests between the amount of variance 
explained by these three factors showed no 
significant differences. 

There are no apparent sex differences on 
the status factor. Five of the seven samples 
that showed significant status effects had 
responded to Form T—a trend paralleling 
findings previously reported in regard to 
the effectiveness of the manipulations. 
Three out of the four Jewish samples show 
significant status effects although very little 
variance is contributed by these samples. 
Only 3 of the 48 tests for two-way inter- 
actions were significant, a finding that 
could easily have arisen by chance, espe- 
cially since there is no correspondence be- 
tween the groups showing such interaction 
effects in this table and in Table 3 concern- 
ing the check on the manipulation. 

These findings lend strong support to 
Rokeach’s theory. Subjects’ affective re- 
sponses to stimulus persons are much more 
strongly influenced by ascribed similarity 
of belief systems than by ascribed religion 
or race. 


Correlational Analysis of Responses to the 
Friendliness Item 


Adult Negro stimulus persons. Essen- 
tially the same “friendliness” question had 
also been asked, in a somewhat different 
format, on the Interest and Attitude ques- 
tionnaire, given to the same subjects a year 
before the present study. At that time, sub- 
jects had been asked to respond to a list of 
many different categories of persons, of 
which two were “a Negro” (or “a Negro 
teenager") and “a Jew" (or “a Jewish 
teenager"). It had been originally hypothe- 
sized (Stein et al., 1965, p. 286) that sub- 
jects’ responses to the Negro teenager 
should correlate moderately both with re- 
sponses to a like-valued Negro and an un- 
like-valued Negro unless, for some reason, 
subjects have an expectation that Negro 
teenagers in general are either like them or 
unlike them in values. The finding that re- 
sponses to “a Negro teenager" correlated 
highly with responses to a Negro teenager 
with unlike values but not with responses 
to a Negro with like values was of con- 


siderable import. Therefore, a similar anal- 
ysis was carried out in the present study. 

Of the 630 subjects, 572 had answered the 
appropriate items in the previous question- 
naire. The 572 subjects represent 16 sub- 
samples, of which 4 each responded to “a 
Negro,” “a Negro teenager,” “a Jew,” and 
“a Jewish teenager.” 

In the study by Stein et al., mean re- 
sponses to “a Negro teenager” fell be- 
tween the means for the other two experi- 
mentally presented stimulus teenagers. It 
seemed reasonable that subjects should 
feel friendliest toward a Negro with like 
values, followed by the unspecified “Negro 
teenager,” and finally by the least pre- 
ferred stimulus teenager, the Negro with 
unlike values. One might expect similar 
outcomes among the subsamples in the 
present study unless the subjects are af- 
fected by the addition of status as a vari- 
able rather than as a constant in the 
description of like- and unlike-valued stimu- 
lus persons, or by the fact that stimulus 
adults elicit different responses than stim- 
ulus teenagers, or unless Negro and Jewish 
subjects respond differently from white 
gentile subjects. A further difference is the 
fact that a year intervened between admin- 
istrations of the questionnaires in the 
present study, as compared with only 2 
months in the former one. 

Table 5 summarizes the results for the 
subjects in the four samples who re- 
sponded to “a Negro” (Form A). In the 
column headed X, the first six means re- 
flect the responses of Negro subjects who 
received Form A. For each of these two 
samples “a Negro” is the most preferred 
stimulus person, although these means are 
not significantly different from those for 
the like-valued Negro of either upper or 
lower status. Since these are Negro sub- 
jects, and this score was taken from the 
previous questionnaire when the "Negro" 
was embedded in a list of other stimulus 
persons, the salience of the Negro stimulus 
was apparently increased and the sub- 
jects responded quite favorably. Means 
for “a Negro” and the like-valued Negro 
are both significantly different from the 
means for the unlike-valued Negro. These 
results, therefore, are consistent with find- 
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TABLE 5 
ANALYSIS OF RESPONSES ON THE “FRIENDLINESS” [TEM TOWARDS NEGRO ADULT STIMULUS PERSONS 
Sample* | N Stimulus persons Ke | s Comparison r DEEST t Between 
Negro 23 | Negro-unlike val- | 2.4 | 1.69 | Negro-unlike values | —.15 2.66** 3.05*** 
ues—lower status —lower status 
Negro-like values— 
upper status 
Negro-like values | 1.5 | .49 | Negro-unlike values 17 3.92***:71:9:00*** 
—upper status —lower status 
A Negro^ 
A Negro^ 1.4| .49 | Negro-like values— 41 0.53 0.0 
upper status 
A Negro? 
Negro 23 | Negro-like values | 1.4 | .36 | Negro-like values— 14 4.14*** | 2.47* 
—lower status lower status 
Negro-unlike values 
—upper status 
Negro-unlike val- | 2.4 | 1.0 | Negro-like values— | —.05 0.44 0.0 
ues—upper status lower status 
A Negro» 
A Negro> 1.3 | .36 | Negro-unlike values | —.08 4.09**** | 2.45* 
—upper status 
A Negro» 
Gentile | 28 | Negro-unlike val- | 2.9 | .81 | Negro-unlike values .50*** | 4.53*** | 0:0 
ues—lower status —lower status 
Negro-like values— 
upper status 
Negro-like values | 2.1| .81 Negro-unlike values | —.10 2.54** 1.04 
—upper status —lower status i 
A Negro? 
A Negro^ 2.2 | 1.21 | Negro-like values— | —.28 0.23 1.07 
upper status 
A Negro» 
Gentile | 28 | Negro-like values | 1.9 | .81 Negro-like values— | —.41* 1.95 1.13 
lower status lower status 
Negro-unlike values 
upper status 
Negro-unlike val- | 2.6 | 1.21 Negro-like values— .22 0.82 0.0 
ues—upper status| lower status 
A Negro^ 
A Negro» 2.1 .81 | Negro-unlike values .05 1.72 1.03 
—upper status 
A Negro 


* Each sample involves different subjects: boys and girls are combined. 


b From questionnaire given when students were in the eighth grade. 


* A low score indicates greater friendliness toward the stimulus person. 
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ings in the former study. However, none of 
the correlations between responses to the 
possible combinations of stimulus persons 
is significant. = 

The final six means in the X column in 
Table 5 represent responses of the two 
“gentile” samples. Here the means are 
ordered as predicted, but responses to the 
unlike-valued Negro stimulus persons are 
significantly different from those to the 
other two stimulus persons for only the 
first gentile sample. In addition, two sig- 
nificant correlations emerge, neither of 
which was predicted. A correlation of .50 
(p < .01) between an unlike-valued Negro 
of lower status and a like-valued Negro of 
upper status does not make any apparent 
sense. Likewise in the other gentile sample, 
a correlation of —.41 (p < .05) between a 
like-valued Negro of lower status and an 
unlike-valued Negro of upper status is not 
readily interpretable. 

Teenage Negro stimulus persons. Table 
6 presents the results for subjects in the 
four samples who responded to a "Negro 
teenager" in addition to the experimen- 
tally presented Negro teenage stimulus per- 
sons. The two gentile samples in Table 6 
provide a replication of the corresponding 
analysis in the study by Stein et al. (1965). 
The first six means under the column 
headed X represent the responses of two 
Negro samples. These subjects again show 
greatest friendliness toward “a Negro teen- 
ager" rather than to the like-valued Ne- 
gro; the only mean difference that is not 
significant is between “a Negro teenager" 
and “Negro-like values—upper status." 
These results, then, are essentially con- 
sistent with the findings for the Negro sub- 
jects who took Form A. The one significant 
correlation obtained for these two Negro 
samples is 52 (p < .02) between re- 
sponses to “a Negro teenager” and to a 
like-valued Negro of lower status. Al- 
though this result was not specifically pre- 
dicted, it seems to make sense. Negro sub- 
jects responded to “a Negro teenager” in 
much the same way as they did to a Negro 
who is like them in values and has lower 
status. Since the Negro subjects them- 
selves come mainly from lower class fami- 


lies, the attributes of this stimulus person 
actually resemble their own most closely. _ 

The last six means under the column X 
in Table 6 present the scores for the two 
gentile samples who responded to “a Negro 
teenager.” For both samples, the means 
are ordered as predicted, although in the 
second sample, the unlike-valued Negro is 
not significantly different from the “Negro 
teenager.” It seems remarkable that these 
means order as predicted in almost all 
samples considering that a year separated 
the responses to “a Negro teenager,” and 
the other stimulus individuals. The only 
significant correlation that emerges is one 
of .53 (p < .01) between responses to “a 
Negro teenager" and an unlike-valued 
Negro of lower status, a result in good ac- 
cord with the rationale underlying the 
earlier findings, given the addition of status 
as a new variable, and the substantial ex- 
posure of the present samples (unlike the 
sample of the previous study) to Negroes, 
who were predominantly from lower class 
backgrounds. 

The fact that significant interpretable 
correlations were obtained with Form T 
but not with Form A suggests that both 
Negro and gentile samples found it more 
meaningful to respond to stimulus teen- 
agers than to stimulus adults. 

Adult Jewish stimulus persons. A sim- 
ilar correlational analysis was carried out 
for responses to the Jewish stimulus per- 
sons. In the results for Form A (Table 7), 
the first six means under the column 
headed X are the responses for two 
Jewish samples. There are significant 
mean differences between responses to “a 
Jew" and a like-valued Jew of either upper 
or lower status, on the cue hand, and to an 
unlike-valued Jew of upper or lower 
status, on the other. The only correlation 
that is significant is one between "a Jew" 
and a Jew with unlike values and upper 
status (r — .30, p « .01). This finding is 
unexpected, and no explanation is offered. 

The mean responses for the two gentile 
samples (last six means of Table 7) are 
in the order predieted, with greatest 
friendliness exhibited toward a like-valued 
Jew, followed by “a Jew,” and finally by 
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TABLE 6 
ANALYSIS OF RESPONSES ON THE “FRIENDLINESS” ITEM TOWARDS NEGRO TEENAGER STIMULUS PERSONS 
Sampl* | N Stimulus person Xe | ot Comparison r gberen i abenan 
Negro 19 | Negro-unlike val- 3.1 | 1.44 | Negro-unlike values | .32 6.34**** | 3.20*** 
ues—lower status —lower status 
Negro-like values— 
upper status 
Negro-like values 1.4 | .36 | Negro-unlike values | .26 ig135**** || 9:20f** 
—upper status —lower status 
A Negro teenager> 
A Negro teenager® | 1.3 | .36 | Negro-like values— | .45 0.37 0.0 
upper status 
A Negro teenager” 
Negro 23 | Negro-like values 1.8 | 1.0 | Negro-like values— | .20 3:62***.5 |: 1.00, 
—lower status lower status 
Negro-unlike values 
—upper status 
Negro-unlike val- 3.0 | 1.96 | Negro-like values— | .52** | 3.21*** 4015995 
ues—upper status lower status 
A Negro teenager” 
A Negro teenagere | 1.3 | .25 Negro-unlike values | .06 ,op**** |i5g1*ee 
—upper status 
A Negro teenager 
Gentile. | 23 | Negro-unlike val- | 3.1 | 1.44 | Negro-unlike values | .24 6.08**** | 3.54**** 
ues—lower status —lower status 
Negro-like values— 
upper status 
Negro-like values 1.6| .36 | Negro-unlike values | .53*** 2.42* 0.47 
—upper status —lower status 
A Negro teenager 
A Negro teenager® | 2.6 | 1.21 Negro-like values— | .00 3.54*** | 2,95 
upper status 
A Negro teenager? 
Gentile | 27 | Negro-like values 2.2| 1.0 | Negro-like values— | .32 2.10* 0.0 
—lower status lower status 
Negro-unlike values 
—upper status 
Negro-unlike val- | 2.7 | 1.0 Negro-like values— | .23 1.66 0.0 
ues—upper status lower status 
A Negro teenager 
A Negro teenager’ | 2.6} 1.0 | Negro-unlike values | .28 0.23 0.0 


—upper status 
A Negro teenager? 


* Each sample involves different subjects; boys and girls are combined. 


> From questionnaire given when stu! 


* A low score indicates greater friendl 


a p < 05. 
wee < 02. 
E 2 < 01. 
p « .001. 


dents were in the eighth grade. 
iness toward the stimulus person. 
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TABLE 7 
ANALYSIS OF RESPONSES ON THE “FRIENDLINESS” ITEM TOWARDS JEWISH ADULT STIMULUS PERSONS 
Sample* | N Stimulus person Re | oe Comparison r gerne |tesibetncen 
Jewish | 56 | Jewish-unlike val- | 2.8 | .81 | Jewish-unlike val- | —.13 7.63**** | 1.88 
ues—lower sta- ues—lower status 
tus Jewish-like values 
—upper status 
Jewish-like values | 1.6 | .49 | Jewish-unlike val- .09 8.06**** | 1.87 
—upper status ues—lower status 
A Jew» 
A Jew> 1.6 | .49 | Jewish-like values .09 0.27 0.0 
—upper status 
A Jewh 
Jewish | 63 | Jewish-like values | 1.7 | .49 | Jewish-like values | —.16 p: 4 aes 1.9, 70**f* 
—lower status —lower status 
Jewish-unlike val- 
ues—upper status 
Jewish-unlike val- | 2.7 | 1.21 | Jewish-like values .08 0.0 1.05 
ues—upper sta- —lower status 
tus A Jew? 
A Jew? 1.7| .64 | Jewish-unlike val- .30*** | 6.74 2.65** 
ues—upper status 
A Jew? 
Gentile | 31 | Jewish-unlike val- | 2.7 | .64 | Jewish-unlike val- .19 6.34**** | 0.74 
ues—lower sta- ues—lower status 
tus Jewish-like values 
—upper status 
Jewish-like values | 1.5 | .49 | Jewish-unlike val- | —.08 2.84**** | 1.22 
—upper status ues—lower status 
A Jew> 
A Jew? 2.0/1.0 | Jewish-like values +29 2.45* 2.05* 
—upper status 
A Jew? 
Gentile | 24 | Jewish-like values | 1.9 | .64 | Jewish-like values .09 5.29**** | 1.52 
—lower status —lower status 
Jewish-unlike val- 
ues—upper status 
Jewish-unlike val- | 3.3 | 1.21 | Jewish-like values 120***- gon. 2.13* 
ues—upper sta- —lower status 
tus A Jew? 
A Jew? 2.2 | 1.21 | Jewish-unlike val- | —.18 3.29*** | 0.0 


ues—upper status 
A Jew> 


* Each sample involves different subjects; boys and girls are combined. 

b From questionnaire given when students were in the eighth grade. 

* A low score indicates greater friendliness toward the stimulus person. 

*p < 05. 
** p < 02. 
d p c 0l. 


ey < (001. 
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an unlike-valued Jew. A sizable correlation 
of .70 (p < .001) occurs between “a Jew” 
and a Jew with like values and lower 
status. Again, no explanation is offered for 
these unpredicted results—unexpected be- 
cause the Jewish students in this school 
system come from predominantly upper- 
middle-class backgrounds. 

Teenage Jewish stimulus persons. The 
analysis of responses to the Jewish teen- 
ager stimulus persons appears in Table 8. 
For the first six means, “a Jewish teen- 
ager” receives the most friendly responses 
for the two Jewish samples. This finding 
parallels a corresponding result for the 
Negro sample and can probably be ac- 
counted for by the salience to Jewish stu- 
dents of the Jewish stimulus as embedded 
in the list of other categories of persons 
presented for reaction in the original Teen- 
age Interest and Attitude Questionnaire. 
Only in the second sample are responses 
to “a Jewish teenager” significantly differ- 
ent from those to the like-valued Jew. But 
in both samples, responses to the unlike- 
valued Jew differ significantly from those 
to both of the other two stimulus teenagers, 
a finding similar to those obtained in the 
other samples. No significant correlations 
appear. The two gentile samples have 
means that follow the expected order, but 
in the second sample the only significant 
difference is that between the Jew with 
like values and lower status and the un- 
like-valued Jew. A correlation of .41 
(p < .05) occurs between responses to the 
unlike-valued Jew of lower status and the 
like-valued Jew of upper status. This cor- 
relation is almost the same as that for the 
equivalent pair of Negro adult stimuli. 
Again, no explanation is offered to account 
for it. 

The correlational data for the Jewish 
stimulus persons fail to confirm the predic- 
tions generally verified in the analyses of 
the data concerning Negro stimulus per- 
sons. Gentile subjects are more prone to 
express friendliness towards otherwise un- 
described Jewish stimulus persons than 
toward Negro stimulus persons. The in- 
terpretation by Stein et al. (1965) with 
regard to assumed dissimilar belief sys- 
tems is thus confirmed only for the single 


analysis that exactly replicates the former 
study—that for teenager Negro stimulus 
persons; gentile subjects tend to respond to 
an otherwise undescribed Negro teenager 
in the same manner as they do to a Negro 
who is unlike them in values and has lower 
status. 


Total Social Distance Scale 


Among our measures the 10-item social 
distance scales are probably the best indi- 
eators of the subjects’ willingness to en- 
gage socially in real-life situations with 
persons similar to the stimulus persons. 
Total scores on the scales were obtained by 
summing responses to the ten items, each 
scored 1 for “Yes” and 0 for “No.” Scalo- 
gram analyses with other data showed that 
the social distance scales form very highly 
reproducible Guttman scales. If the sub- 
ject omitted no more than three responses 
to a scale, a “Yes” or “No” response was 
randomly assigned to each omitted item 
to facilitate computer analyses. Sixty-one 
subjects failed to answer enough of the 
questions basic to the present study and 
were therefore deleted from the analysis. 
A separate analysis of the individual items 
of the social distance scales will be pre- 
sented in the next section. 

A summary of the 16 analyses of var- 
jance for responses to the Total Social 
Distance scales appears in Table 9. In 
the column headed Belief, it can be seen 
that similarity of values again accounts 
for the greatest proportion of variance, ac- 
counting for significantly more variance 
than either the race, the religion, or the 
status effect (p < .01 for all rank-order 
differences). The belief effect is significant 
in 15 of the 16 samples at p € .001; in the 
remaining sample (gentile males, Form 
T, Religious Comparison) the belief effect 
is significant at p < .05. For the samples of 
all three ethnie groups, the belief effect 
accounts for significantly more variance 
among girls than among boys (rank-order 
differenee at p « .01). 

For the first time in the analyses re- 
ported here, race appears to have a sys- 
tematic influence on subjects’ responses, 
although the amount of variance controlled 
by the race effect is small. Race was var- 
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TABLE 8 
ANALYSIS OF RESPONSES ON THE “FRIENDLINESS” ITEM TOWARDS JEWISH TEENAGER STIMULUS PERSONS 
Sample* | N Stimulus person Xe o Comparison r [y between, 
Jewish | 81 | Jewish-unlike val- | 3.1 | .81 | Jewish-unlike val- —.13 |11.96**** | 2, 98* 
ues—lower status ues—lower status 
Jewish-like values— 
upper status 
Jewish-like values | 1.5 | .49 | Jewish-unlike val- .12. |13.76**** | 3.73**** 
—upper status ues—lower status 
A Jewish teenager^ 
A Jewish teenager^ | 1.4 | .36 | Jewish-like values— .18 | 0.51 1.40 
upper status 
A Jewish teenager 
Jewish | 78 | Jewish-like values | 2.0| .81 | Jewish-like values— | —.04 | 5.55**** | 1.76 
—lower status lower status 
Jewish-unlike val- 
ues—upper status 
Jewish-unlike val- | 2.9 | 1.21 | Jewish-like values— | —.08 | 2.72*** | 1.03 
ues—upper status lower status 
A Jewish teenager? 
A Jewish teenager’ | 1.6 | .64 | Jewish-unlike val- +12, | 8.93**** | 2.85*** 
ues—upper status 
A Jewish teenager? 
Gentile | 26 | Jewish-unlike val- | 3.2 | 1.0 | Jewish-unlike val- .41* | 7.02**** | 1.96 
ues—lower status ues—lower status 
Jewish-like values— 
upper status 
Jewish-like values | 1.8| .49 | Jewish-unlike val- -03 |3.01*** | 0.0 
—upper status ues—lower status 
A Jewish teenager 
A Jewish teenager? | 2.3 1.0 | Jewish-like values— .28 | 2.31* 1.86 
upper status 
A Jewish teenager 
Gentile | 19 | Jewish-like values | 1.8 | .36 | Jewish-like values— | —.18 | 2.31* 2.24* 
—lower status lower status 
Jewish-unlike val- 
ues—upper status 
Jewish-unlike val- | 2.5 | 1.0 | Jewish-like values— .29 | 1.30 2.30* 
ues—upper status lower status 
A Jewish teenager? 
A Jewish teenager’ | 2.1 | 1.0 | Jewish-unlike val- —.05 | 1.07 0.0 


ues—upper status 
A Jewish teenager 


* Each sample involves different subjects; boys and girls are combined. 


^ From questionnaire given when students were in the eighth grade. 


* A low score indicates greater friendliness toward the stimulus person. 
*p < 05. 
* p < .02. 
*** 5 < 0l. 


ee p < 001. 
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ied in eight samples, in four of which 
there were significant race effects. Three of 
these four were gentile samples and the 
other was Negro females, Form A. An ex- 
amination of the mean scores on the Total 
Social Distance scale to each of the eight 
treatments shows that this Negro sample 
favors Negro to white stimulus persons. 
The three gentile samples favor white to 
Negro stimulus persons. Since the social 
distance scale is designed to assess sub- 
jects’ commitment to interact with the 
stimulus persons rather than just feel 
friendly or unfriendly toward them, race 
might be expected to play a more impor-. 
tant role here than in the case of the 
“friendliness” item. 

As can be seen in the column headed 
Religion in Table 9, significant effects for 
religion appear in five of the eight samples 
in which religion was varied. In four of 
these five, the effect is significant at only 
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the .05 level and in the other, at the .01 
level. It is apparent, however, that the re- 
ligion effect, like the race effect, contributes 
a negligible proportion of variance. None- 
theless, religious affiliation becomes impor- 
tant when behavioral commitment rather 
than diffuse expression of friendliness is 
involved. The appearance of significant 
religion effects does not depend system- 
atically on the sex of the respondent or 
on the Form (A or T) of the questionnaire 
administered. Note, however, that three of 
the five samples that showed significant ef- 
fects for religion were Jewish subjects. It 
is somewhat reasonable to say that reli- 
gious membership is particularly salient 
for Jews, both because of their “minority” 
status and because of the emphasis on the 
Jewish way of life in most Jewish homes. 
This fact may also in part explain why 
Jewish samples showed particularly strong 
belief effects; many of the items in the 


TABLE 9 
SUMMARY OF THE 16 ANALYSES OF VARIANCE FoR RESPONSES TO THE TOTAL Soctan Distance SCALE 
r Race | peli 
Race Belief Status Ban E To S els 
Sample Form| N 
Prop. Prop. Prop. 
dM Amys rele] ale AA 
ance ance ance 
Negro males A | 25 | 2.19 01 | 24.49****| .13 | .95 .00 14 .14 | 0.0 
Negro females A | 25 11.07*** | .03 |126.30****| .33 .04 .00 | 6.63* .71 | 7.90*** 
Gentile males A | 30 |26.42****| .12 | 16.20****| .07 12.58**** .05 | 4.46* | 2.17 | .84 
Gentile females A |33 | 1.15 .00 | 88.34****| .30 | .31 .00| 3.73 2.80 | 1.34 
Negro males T |23| .18 .00 | 21.51****| .15 | 3.24 .02 .42 3.53 | 2.98 
Negro females T|24| .73 .00 | 51.36****| .30 | 1.97 .01| 1.02 1.75 | 1.96 
Gentile males T | 26 | 4.37* .01 | 44.04****| .19 |17.92****| .07 | 0.0 5.75*| 2.83 
Gentile females T | 26 | 9.07*** | .04 | 37.32****| .20 | 4.03* .02 | 2.67 .90 17.77**** 
" Reli- " 
Religion Belief Status Religion X | gion | Belief X 
tatus 
Jewish males A | 69 | 6.51* .01 |123.95****| .22 |25.12****. .04 .40 .56| .32 
Jewish females A | 68. 5.48* .01 |236.15****| .35 | 1.16 .00 .48 148 | .04 
Gentile males A: | 32 | 5.38* .02 | 61.02****| .28 | .03 .00| -06 .80| .15 
Gentile females A | 30 | 0.0 .00 | 65.63****| .26 | 3.37 .01, 2.74 .16| .05 
Jewish males T | 84 | 5.27* -01 |116.84****| .19 |55.59****) .09 72 2.16 | 2.16 
Jewish females T | 88) 1.17 .00 |215.93****| .30 |56.07****| .08 .60 3.44 | 7.75*** 
Gentile males T |20| .98 .00| 6.27* .06 | .23 .00| 2.74 .36 20" 
Gentile females T |27.|7.99*** | .04 | 34.32****| .19 9.44*** | .05 .89 .90 | 5.08 
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value scale reflect important concepts and 
ideas in the Jews' cultural and religious 
upbringing. 

The final factor to be examined is status. 
In the comparisons involving race, belief, 
and status, status effects were significant 
in three of the eight samples. These three 
are all gentile samples (male Form A, 
p < 001; male, p < .001 and female, 
p < .05, Form T). In these samples, 
stimulus persons of high status are pre- 
ferred to those of low status, although the 
proportion of variance contributed by the 
status effect is again minimal. In none of 
the four Negro samples was the status ef- 
fect significant. No obvious explanation 
from Rokeach's theory is at hand as to 
why status should be more important for 
gentile than for Negro subjects. It may be 
that Negroes minimize the importance of 
status because of their limited opportu- 
nities to obtain high status. 

'The influence of status in analyses in 
which it is pitted against religious affili- 
ation is no greater than in those in which it 
is pitted against race. (The rank-order 
difference between Q? values for both 
status effects is not significant.) The 
status effect in the religious comparisons is 
significant in four out of eight samples. 
In the three of these four (all of which are 
Jewish samples) it is significant at p < 
.001 and in the other, at p € .01 (gentile 
females, Form T). Perhaps the relatively 
greater importance of status for Jewish 
samples may be understood in terms of 
the integral part that status attributes 
play in the value system held by upper- 
middle-class Jews, such as these subjects. 

Seven of the 48 two-way interactions 
were significant. Four of these seven in- 
volve the Belief x Status interaction, and 
three of the four involve subjects who took 
Form T (see Table 9). A look at the 
mean responses to the various stimulus 
persons for these samples shows that when 
low status is ascribed to stimulus persons, 
similarity of values is essentially unrelated 
to subjects’ responses. On the other hand, 
subjects tend to respond more favorably 
to a stimulus person of high status and like 
values than to one of high status and un- 


like values. The other three significant in- 
teractions are for Race X Belief (Negro 
females and gentile males, Form A) and 
Race X Status (gentile males, Form T). 
For both Race Xx Belief interactions, 
when unlike values are ascribed to stim- 
ulus persons, race is unrelated to subjects' 
responses. But when like values are 
ascribed to stimulus persons, the Negro 
sample preferred Negro to white stimulus 
persons and the gentile sample preferred 
white to Negro stimulus persons. In the 
Race x Status interaction, when lower 
status is attributed to stimulus persons 
race is unrelated to subjects' responses. 
When upper status, however, is ascribed 
to stimulus persons, subjects react more 
favorably to white than to Negro stim- 
ulus persons. 

No interpretation is offered for these 
interactions, which appear to reflect com- 
plieated relationships between race, reli- 
gion, and socioeconomic status of the sub- 
jects and the variety of meanings attached 
to potential associations with minority 
group members in a wide range of social 
situations. A separate analysis of the in- 
dividual items on the social distance 
seale was undertaken to discover what 
some of these relationships might be. 


Analysis of Individual Social Distance 
Items 


Since subjects responded dichotomously 
(Yes or No) to the individual social dis- 
tance items, the analysis of variance is 
inappropriate for these data. In order to 
present relevant comparisons for in- 
spection, the percentage of endorsement 
for each level of each factor has been 
computed. The means and standard de- 
viations for responses to the individual 
items of the social distance scale for the 
several stimulus persons, forms, and sam- 
ples appear in Stein (1965, pp. 149—180). 
For a given item on the social distance 
scale, the total number of “Yes” responses 
to the four white stimulus persons was 
divided by the number of subjects respond- 
ing to these stimulus persons, to obtain 
the absolute percentage of endorsement 
given to all white stimulus persons. This 
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process was repeated for responses to the 
four Negro stimulus persons, like-valued 
stimulus persons, etc. Looking at the first 
cell in Row one of Table 10, for example, 
we can say that for this sample of 25 Negro 
males, 84% of all responses to the four 
white stimulus persons were “Yes”; that 
is, subjects express themselves as quite 
willing to have a white stimulus person as 
a neighbor. 

Tables 10 and 11 illustrate the absolute 
percentages for both Forms A and T. Since 
Tables 12 and 13 present the percentage 
difference between the two levels of each 
factor, these tables will be discussed in de- 
tail. The absolute percentage tables are 
presented in order to give an idea of the 
frequency with which items were endorsed. 
In general, the more intimate the social 
situation, the less frequently the item is 
endorsed. 

Tables 12 and 13 show the percentage 
difference between the two levels of each of 
the three factors: race (or religion), be- 
lief, and status, on Forms A and T, re- 
spectively. These percentages were cal- 
culated by taking the difference between 
the two absolute percentages for the two 
levels of each factor (Tables 10 and 11). 
The percentage responding “Yes” to 
Negro stimulus persons was subtracted 
from the corresponding percentage for 
white stimulus persons, and also for un- 
like values versus like values, lower 
status versus upper status, and Jew versus 
Protestant or Catholic. Thus, the resultant 
percentage, if positive, reflects a prepon- 
derance of positive responses to white, like- 
valued, upper status, Protestant or Catho- 
lie stimulus individuals, respectively; and 
if negative, a corresponding preponderance 
of positive responses to Negro, unlike- 
valued, lower status, or Jewish stimulus 
individuals. For example, the percentage 
difference of —1 in the upper left cell in 
Table 12 means that 1% more positive re- 
sponses were made to Negro than to white 
stimulus persons: the difference between 
the first two cells in Row 1 of Table 10 
(84% and 85%). Sign tests were computed 
to test for differences between responses to 
the levels of a given factor and for sex dif- 


ferences. All values reported in this section 
are based on sign tests. 

Table 12 presents the results for Form 
A for both racial and religious compari- 
sons. A look at the columns headed 
Belief shows that the percentage differ- 
ences are on the whole large and positive. 
In the comparisons involving Negro and 
white stimulus persons, belief appears to 
be as important with respect to items that 
identify casual situations as it does for 
items that identify more intimate ones. 

With respect to differences associated 
with the race of the stimulus persons, 
among the two gentile samples, female 
subjects appear to be more tolerant ra- 
cially than males. A sign test for this sex 
difference is significant at p < .01. For 
the gentile males, the percentage differ- 
ences are moderately large and all posi- 
tive; the males clearly tend to prefer 
white stimulus persons to Negroes (p 
< 01). In fact, none of the gentile male 
subjects would be willing to have a close 
relative marry any of the Negro stimulus 
persons (see Table 10). For the Negro 
samples, Table 12 tells a somewhat dif- 
ferent story. Negro subjects prefer Negro 
stimulus persons to white for all items 
(p < .01), but the negative percentage dif- 
ferences are all relatively small. We also 
see in Table 12 that gentile subjects, par- 
ticularly males, generally prefer high- 
status stimulus persons to low when status 
is pitted against race and belief. High is 
preferred to low status by both males and 
females (p < .01), but the difference is 
significant only for males. The two Negro 
samples show moderate preference for 
high- rather than low-status stimulus per- 
sons although the difference is not sig- 
nificant, and contrary to the results for the 
gentile samples, there is no significant sex 
difference. 

The bottom half of Table 12 shows the 
results for the comparisons involving re- 
ligion, belief, and status (Form A). With 
regard to religion, gentile males were more 
likely to reject Jews than were gentile fe- 
males (p < .01). The percentage dif- 
ferences for religion are all quite small, but 
gentile females even show a net preference 
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TABLE 10 
ABSOLUTE PERCENTAGES OF Yrs" RESPONSES TO THE INDIVIDUAL ITEMS OF THE SOCIAL 
Distance Scare ror EacH FACTOR Lever or Race (Warre, NEGRO) OR RELIGION 
(PROTESTANT, Carnouic, Jewish), BELIEF (Like VALUES, UNLIKE VALUES), 

AND Status (UPPER, LOWER), AnuLT FORM 


Item on social distance scale 


Negro males N = 25 


Negro females N = 25 


Gentile males N = 30 


Gentile females N = 33 


WIN 


LV |UV | US| LS |W |N | LV 


UV | US| LS | W| N| LV|UV 


W | N| EV|UV | US| LS 


Race X Belief X Sta- 

tus comparisons 

Neighbor on street 

Work on charity 
drive. 

Speaking acquaint- 
ance ` 

Go to party 

Member of social 
club 

Live in same apart- 
ment house 

Close personal friend 

Invite home to din- 
ner 

Have- close relative 
marry 

Share an apartment 
with 


Religion X Belief X 
Status Compari- 
sons 

Neighbor on street 

Work on charity 
drive 

Speaking acquaint- 
ance ` 

Go to party 

Member of social 
club 

Live in same apart- 
ment house 

Close personal 
friend 

Invite home to din- 
ner 

Have close relative 
marry 

Share an apartment 


100| 68| 88| 80/72/77| 93| 


86| 77| 83| 80/73|77| 87 


Jewish females N = 68 


57| 71| 79/88/62| 78) 72 


58| 72, 72/73/62| 78| 57 
52) 61| 71/82/58| 73| 67 


63| 78| 72/78,58| 67| 70) 


7665| 91| 50| 71| 69 
66|76| 91) 51) 66| 76 


74/76| 84| 65| 71| 78 
52/69/67| 78| 62| 07| 69 


56/62) 80| 38) 60) 57 


32/33) 54) 10} 36| 29 


81/86 
72/14 


59/67 


93| 
86| 60| 72) 7472/74| 90) 
88 
88) 


83) 
76| 93| 71)82|90) 96 
78| 50} 70) 57/66/71) 90) 


58| 25) 47| 36,2646] 53| 


|o 
uv | Us | is |; | J | Lv|UV 


56| 72| 74/72,59| 78| 54 
68| 85| 78/70/60| 84| 45 


73| 82| 8275/69| 83| 61 
10| 42| 38/5338| 76| 15 
35] 60| 58/5542| 73| 24 
19| 38| 34/3822] 46| 14 


11| 34| 32,3522, 48 


81|85| 94| 73| 87| 80 
78/87/ 90| 74| 87| 78 


with 34|37| 54| 17| 43| 28/30|35| 54. 
Note.—W = White J = Jewish P/C = 
N = Negro P = Protestant 
LV = Like values 
UV = Unlike values 
US = Upper status 


LS 


Lower status 


Protestant/Catholic 


Protestant subjects received 
Protestant stimulus persons 
Catholie subjects received 
Catholic stimulus persons 
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TABLE 11 
ABSOLUTE PERCENTAGES or “Yes” RESPONSES TO THE INDIVIDUAL ITEMS OF THE SocraL 
DisTaNcE ScanE FoR EacH Factor LeveL or Race (WHITE, NEGRO) OR RELIGION 
(PROTESTANT, CATHOLIC, JEWISH), BELIEF (Like VanuEs, UNLIKE VALUES), 
AND Status (UPPER, LOWER), TEENAGE FORM 


Negro males N = 23 | Negro females N = 24 Gentile males N = 26 | Gentile females N = 26 
Items on social distance scale 


w| NILV |UV | US | LS | W| N| LV|UV | US| LS |W |N |LV UV|US | LS | WIN |LV |UV | US| LS 


Race X Belief X Sta- 
tus comparisons 

Sit next to in class (7670| 83| 63| 82| 64/74/77| 95) 56| 77 7A(78|74| 88) 

Work on committee 


86| 67,7981, 90| 69| 83) 77 


with eslasl. 77| 39| 66} 50,6060, 83| 38| 69| 5155/69| 76 Ti| 47/63/05 81| 48| 75| 54 
Speaking acquaint- 
ance 50/50| 68| 39| 58| 48/62/68 


65 
48 
78| 52| 68| 62)65|74| 82| 56| 75 64|69|79| 86) 62 81| 67 
Go to party to which 
60 
55) 


person was invited |71]74| 85| 61| 73) 72/75, 87| 53| 67| 72/76/62, 78) 76| 62/79/50] 79| 50| 63| 65 
Eat lunch with 67|70| 85| 52| 70| 67/60/55| 80| 35| 57, 5873,06. 84 79| 60|7344| 77, 40) 60) 58 
Member of social 1 

group 50l61| 70| 41| 61| 50|47|55| 74| 29| 55| 4760/58 | 75| 43) 64) 54|5442| 69| 27| 52| 44 
Live in same apart- 


64| 45| 59| 51/75/38 58| 56| 60 54 


ment house 59/53} 66| 46| 59| 52/79) 46) 

49| 18| 41| 27/36/40} 55] 21) 46) 31/36/25) 48) 13 38| 23 
29 
0 


Close personal friend |48443| 56) 34 50| 41/30) 
Invite home to din- 

ner 43146) 59} 30| 50| 3944 
Date brother (sister) |32/35| 46) 22) 41 26)32) 


48| 25| 48| 25/40/21) 44| 17| 36) 25 
28| 13| 26| 15|27|15| 33| 10) 29. 13 


SS 88 a SX 
Es] 
g 
a 
2 
£ 


Gentile males N = 20 | Gentile females N = 27 


i 
i 
: 


iO 
| | zv|vv | us| s 


w 
z 
E 
a 
E 
"d 
bi 
3 
z 
E 
| P/C 
|= 
PI 
E 
|8 
E 


Religion X Belief X 
Status compari- 
sons 

Sit next to in class |81|80| 90| 70| 91| 76/85/88) 93 81| 93| 81/85/67, 87| 65| 76| 76/88/76) 91) 74) 92 72 


Work on committee 


with 6571) 82| 55| 83| 53/06/04, 83) 48 77| 53|5454| 64| 43| 60| 48/68/58) 80) 46) 74| 52 
Speaking acquaint- 
ance ELE 8282| 92| 73| 85| 79|85|86| 95) 76) 90| 81/74/76 83| 67| 87| 63/74 71| 85 59| 78| 07 


Go to party to which 
person was invited |80|87| 91| 76 89| 78|81|78| 92| 68| 84| 76/82/75 87| 69| 73| 84/83,72| 91| 65| 85| 70 


Eat lunch with 7476| 88| 61| 80| 69/61|64| 85| 41 70| 56/62/58] 69| 50| 60| 60/7271, 84 59| 76| 67 
Member of social 

group ios 60/65) 78| 46| 73| 52/53/00| 78) 35) 66| 46,54/59| 71| 43| 50| 63,5540, 67 35 54| 48 
Live in same apart- 


ment house 77/84! 88| 75| 85| 77|82|87| 89| 80 87| 82,74|09| 78) 65| 74| 69|81\61) 74| 68 72) 71 
Close personal friend |3942| 61) 20 53| 28/33/36] 61| 8| 48| 224522 39| 28| 32) 35/48/30. 54) 24) 50) 28 


Invite h to din- 
ner IN 4549| 66| 28| 57| 37,4047, 72) 22 57| 37,4030, 38| 32| 32| 37,55 36| 65 26| 52| 39 


Date brother (sister) |29|42} 48| 23| 48) 24/3230] 53| 18| 49| 22/30/24| 34| 21 38| 17/45/26) 52| 18| 45| 26 


Note.—W = White J = Jewish P/C = Protestant/Catholic 1 
N = Negro P = Protestant Protestant subjects received 
LV = Like values Protestant stimulus persons 
UV = Unlike values Catholie subjects received 
US = Upper status Catholic stimulus persons 
LS = Lower status 
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TABLE 12 
PERCENTAGE DIFFERENCES BETWEEN THE Two LEvELs OF RACE OR RELIGION, BELIEF, 


AND STATUS FOR “‘YEs’’ RESPONSES 


TO THE INDIVIDUAL ITEMS OF THE 


SocraL DISTANCE SCALE, ADULT FORM 


AG oer 2S DOE Hue males Sendia ans 
Items on social distance scale 
Race | PE |status| Race | BE |status| Race | Be |Status | Race | Be; [status 
Race X Belief X Status compari- 
sons 
Neighbor on street —1 |31 8| —5|37| -8| 27 7| 17) (| 41 2 
Work on charity drive —2|39| 13| —4|39 4 3| 27| 13| —9 |40| —9 
Speaking acquaintance —3 | 14| 10| —14 | 28 1| 12| 22| 18| —2 |19| —7 
Go to party to which person 
was invited —15 | 20 7 | —20 | 28 | 10| 23 7|] 17 2 |14| -2 
Member of social club —4 | 29 9| —8 | 52 7| 20| 27| 20] —6 |42 8 
Live in same apartment house —3 | 10 3| —3|23 7| 20| -3 7| 14|23 4 
Close personal friend —5 | 35 | —1| —15 | 60| —5 | 15| 25| 12| —1| 44 7 
Invite home to dinner —13 | 22 2 | —18 | 42 5| 23| 18| 25 8 | 44 4 
Have close relative marry 16 | 12 4| —8 | 30 0| 47| 23| 10| 18|24| 12 
Share an apartment with —7 | 233 | —6 | —18 | 40 | —7 | 20| 17] 10 5 | 40 4 
Jewish males Jewish females Gentile males Gentile females 
-29 - N = 32 = 
FX ji Status pos nx Status Bale Belief |Status n Is Status 
Religion X Belief X Status com- 
parisons 
Neighbor on street —4 |19| 11| —1| 2 0| 11) 930 2| —7 | 27 0 
Work on charity drive —2 | 26 | —3 | —2 | 34 | —2 | 13 | 24| —2 | —3 | 37 | -3 
Speaking acquaintance —3 |16| 17| —4 | 26 6| 10| 40| —3 | —4| 20 6 
Go to party to which person 
was invited 0 |12| 22| —9 |19 3| —1| 15| —7| —9| 16 9 
Member of social club —13 | 28 | 13 |. —4 | 43 4| 10| 39 3 | —1 | 36 2 
Live in same apartment house —2 | 19 7| —4|19 0 6| 23| —4| —5| 18 5 
Close personal friend —8|49| 13| —2 |61 4| 14| 62 2 7|46| 11 
Invite home to dinner —8|31| 16| —2 |49 2| 12| 49| —1 5|42| 16 
Have close relative marry —17 | 32 | 10 | —20 | 34 5| 16| 32| 10 9| 28| 16 
Share an apartment with —3 | 36 | 14| —5 | 43 2| 13| 39| 10| 10|33]| 10 
for Jews on the six least intimate items. erately strong status effects on items 


One reason for this finding may be that 
Jewish students tended to predominate in 
the “leading crowd” in the Commuter- 
town high school, as indicated by socio- 
metric data in the study by Hardyck and 
Smith (in preparation). For Jewish sub- 
jects, the only item on which there is 
moderately strong in-group preference in 
both sexes is “have close relative marry.” 
In general, status is not particularly 
important, but for the gentile samples it 
becomes more so for the more intimate 
items, as was not the case in the top half 
of the table. Jewish males also show mod- 


throughout the scale, and these effects 
are significantly greater than are those 
for Jewish females (p « .05). 
Corresponding percentage differences 
for the teenage form appear in Table 13. 
All the columns headed Belief again show 
large positive percentage differences, as 
none would expect from the results of the 
analysis of variance of total social dis- 
tance scores. The findings with respect to 
Race for the gentile samples are different, 
however, from those with the adult form. 
Here, there is no significant sex difference 
in preference to white rather than Negro 
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stimulus teenagers. However, on one item: 
“Date your brother (sister) ,” gentile males 
show a much larger preference for the 
white stimulus persons as a date for their 
sibling than do the girls (41% versus 
12%). This is clearly in line with societal 
demands and parallels findings on Form 
A for the item, “have close relative marry.” 

The present results are very similar to 
the findings of the Stein et al. (1965) 
study, which, it will be remembered Stein 
et al. involved gentile subjects but com- 
bined sexes for the analysis and used 
stimulus teenagers who were all high in 
status and varied only in race and belief. 
They found a significant belief effect on all 
10 items (t tests could be employed in 
their design) and a signifieant race ef- 
fect on only 3 items: “live in the same 
apartment house,” “invite home to din- 
ner," “date brother.” The race effect also 
approached the .05 significance level on 
“go to a party with.” They concluded 
that on such “sensitive” items race was 
important because the items reflected areas 
in which there are strong societal pres- 
sures against interracial contact. For the 
gentile samples in the present study, 
these same items show the largest differ- 
ences in endorsements for white as versus 
Negro stimulus teenagers. The comparison 
of male and female subjects, new with 
the present study, shows that females are 
somewhat more likely than males to prefer 
white stimulus persons on the items con- 
cerning “live in the same apartment,” 
“go to party with,” and “invite home to 
dinner” but the reverse is true for “date 
your brother.” In addition, female sub- 
jects show less acceptance of Negroes on 
the item, “eat lunch with.” We can con- 
clude, then, that the results of the Stein 
et al. study are essentially replicated by 
the present findings. For non-Jewish 
white subjects, belief is an important fac- 
tor throughout the social distance scale, 
but race comes into play for the items 
that represent socially taboo areas of in- 
terracial eontact. 

For the Negro samples, similarity of 
belief is again quite important for inter- 


personal preference, with the female sub- 
jeets tending to show slightly larger posi- 
tive percentages for belief than the 
males (p « .05). Race effects on individual 
items are small and inconsistent. 

'The bottom half of Table 13 shows the 
percentage differences in response to teen- 
age stimulus persons when religion, be- 
lief, and status are varied. Again, belief 
effects are all positive and fairly large for 
all items with all samples. The small re- 
ligion effects may be summarized by say- 
ing that they are strongest for gentile 
females and for gentiles of both sexes for 
the more intimate items. The absence of 
appreciable religion effects on any of the 
items for the Jewish samples is perhaps 
surprising. For these samples, status is also 
moderately important, and becomes more 
so as the items increase in intimacy. 


Responses to Stimulus Individuals Who 
Vary in Race or Religion and Status 
Factors Only 


Stein et al. (1965) had found, in an 
analysis of responses of their subjects to a 
questionnaire that had been administered 
2 months before the presentation of the 
stimulus persons varying in race and 
similarity of belief, that social distance 
to stimulus teenagers described in terms of 
race and status (with no information 
about belief) is determined by both of 
these factors, with the race effect ex- 
plaining twice as much variance as the 
status effect. 

Subjects in the present study had re- 
sponded while in the eighth grade to 
similar stimulus teenagers or adults, with 
religion also varied. For teenage stimulus 
persons, status was varied as in the 
former study; for adults, it was varied by 
the combination of a professional occupa- 
tion with the phrase “is making a good 
income” as opposed to a manual occu- 
pation with the phrase "is making a low 
income." For purposes of the present anal- 
ysis, subjects were classified according to 
their own race and religion, and thus total 
social distance scale scores examined with 
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TABLE 13 
PERCENTAGE DIFFERENCES BETWEEN THE Two LEvELS or RACE OR RELIGION, BELIEF, 
AND STATUS FOR ‘Yes’? RESPONSES TO THE INDIVIDUAL ITEMS OF THE 
Soctat Distance ScALE, TEENAGE Form 


Negro males Negro females Gentile males Gentile females 
N-2 N=u N = 26 N = 26 
Items on social distance scale 
Race | BE | SU | Race | BE Status | Race | BE; | Status} Race | Be [Status 
Race X Belief X Status com- 
parisons 
Sit next to in class 6|20|19| —2 | 40 2 4 | 23 20| —2]|21 6 
Work on committee with 19 | 38 | 16 0|46| 17 | —14 | 28 30} —2 |33| 21 
Speaking acquaintance 6|29|10| =7|26 5| -8 | 26 11| —10 | 25 | 13 
Go to party to which person 
was invited -2|24| 2| —9|34| —5 14 | 17 14 29 | 29 | —2 
Eat lunch with —3 | 33| 2 5| 45} —1 7 | 30 20 29 | 36 2 
Member of social group —12 | 28 | 11 | —8 | 45 8 1 | 32 10 12 | 42 8 
Live in same apartment house 6 | 20) 6 11 | 28} 10 19 | 19 4i 36| 2 6 
Close personal friend 4|22| 8| —8|31| M| —5{ 34 15 12 | 35 | 15 
Invite home to dinner —3 | 29 | 11 | —10 | 43 | 14 14 | 22 23 19 | 27 | 12 
Date brother (sister) —2 | 24 | 15 | —13 | 42 5 41 | 14 10 12 | 23 | 15 
Jewish males Jewish females Gentile males Gentile females 
N=% N = 88 = N = 27 
we |t [S | eh |B foin Bak | | status | Ba | sm 
Religion X Belief X Status 
comparisons 
Sit next to in class —5|14|15| —3 | 12| 12 18 | 22 0 12 |16| 20 
Work on committee with —6 | 27 | 30 2 |35| 24 0 | 22 12 11|33| 22 
Speaking aequaintance 0|19| 6 0| 19 8| -2|16 24 3|26| 11 
Go to party to which person 
was invited —7 | 15| 11 3 | 24 8 7|18| —11 11|26]| 15 
Eat lunch with -2|27|11| —3|44| 14 4| 19 0 1|24 9 
Member of social group —5 |32 | 21| —6 |43| 20| —5 | 28) —12 9 | 32 6 
Live in same apartment house | —7 | 12| 8| —5| 8 5 6 | 13 6 20| 6 2 
Close personal friend —2|40|25| =3 |52| 26 22] 1| -8 18 | 30| 23 
Invite home to dinner —5 | 38 | 20 0|50| 20 10| 6| —5 20|39| 14 
Date brother (sister) —13|25|24| —7|36| 28 6 | 13 21 18 | 34| 19 


respect to the stimulus persons specified in 
Table 14.5 

This analysis will be discussed in terms 
of.a summary of the 24 analyses of 
variance computed on these data (Table 
15). Means and standard deviations are 
presented in Stein (1965, p. 80). 

Looking first at the top half of Table 15, 
we can see that the race effect was sig- 


* For Catholic subjects, race and religion were 
confounded in the descriptions of the Negro stimu- 
lus persons. Since there was no “Catholic Negro,” 
the subjects had to respond to a *Protestant Ne- 
gro." Only 591 of the 630 subjects had scores on 
the appropriate social distance scales in the in- 
terest and attitude questionnaire. 


nifieant at the .001 level in 9 of the 12 
samples, and at lower levels in 2 other 
samples (Negro females, Form T, p < .01; 
Negro males, Form A, p « .05). It fell 
short of significance in only one sample 
(Negro male, Form T). Examination of 
the means shows that Negro subjects tend 
to prefer stimulus persons of their own 
race. Inspection of the column in Table 15 
showing the proportion of variance con- 
tributed, however, reveals that race seems 
to be more important for white samples 
than for Negro samples. (N is too small 
to compute a sign test.) 

When pitted against race, status has 
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significant effects in only 7 of the 12 
pertinent samples, and in none of these 
does significanee approach the .001 level. 
The rank-order difference between the 
Q? values for the race and status effects 
shows that race contributes significantly 
more variance than status (p « .01). In 
comparison to the race effect, then, status 
is a less powerful but still important de- 
terminant of choice. These results confirm 
he findings of Stein et al. 

Status is strongly affected by the form 
of the questionnaire. Only two of the six 
analyses on the adult form as compared 
with five of six analyses on the teenage 
orm yield significant results. The rank- 
order difference between Q? values for 
Form T versus Form A is significant at 
ess than the .05 level; more variance is 
contributed by status in Form T samples 
than in Form A samples. There seem to 
be two possible explanations for these 
findings. First, the status descriptions in 
he adult form may have been too vague 
(“making a good income"; “making a low 
income"). The other possibility is that 
adult status attributes are less important to 
teenagers than teenage status attributes. 
We may conclude, then, that in the ab- 
sence of information about beliefs, race is 
a powerful determinant of interpersonal 
preference with status contributing à 
smaller but significant influence that is 
confined primarily to the teenage form. In 
general, there are few significant inter- 
action effects. 

The bottom half of Table 15 presents 
the results of analyses in which religion 
and status are varied. All samples show 
highly significant religion effects (nine 
reach the .001 level and the other three 
the .01 level). The three samples in which 
the religion effect is only significant at 
p < 01 are all Protestant samples. This 
finding would follow if religious affilia- 
tion is somewhat more salient for the 
“minority” Jewish and Catholic subjects 
than for the “majority” Protestants. In 10 
of the 12 samples there are significant 
status effects, with 6 of these being sig- 
nificant at p < .001. The rank-order dif- 
ference between the 2 values for religion 
and status was not significant. Status 


TABLE 14 


STIMULUS PERSONS UsED IN ANALYSES IN WHICH 
Race or RELIGION AND STATUS ARE VARIED 


€ Race Varied Religion Varied 
D ME UC MEE T 
. Protestant White Protes- White Protes- 
tant tant 
Negro Protes- White Jewish. 
tant 
Catholic White Catholic White Catholic 
Negro Protes- White Jewish 
tant 
Jewish White Protes- 
tant 
White Jewish 
Negro White Protes- 
tant 
Negro Protes- 
tant 


differences thus appear to be more im- 
portant when status is varied with religion 
than when it is varied with race—a finding 
that did not appear in the analyses re- 
ported earlier in which similarity of belief 
was also varied. This finding is not too 
surprising, in that religion is a less power- 
ful variable than race, and, of course, 
belief. When only religion and status are 
varied, subjects are as likely to make 
distinctions on the basis of one factor as 
they are on the other. When race and 
status are varied, the factor of race pre- 
dominates, and, as we have seen, when in- 
formation about belief is added, this tends 
to wash out the influence of the other fac- 
tors. 

Again, the status factor seems particu- 
larly important for all four Jewish samples, 
which show status effects significant at 
p < .001 and have status contributing a 
large proportion of the variance. Only 4 
of the 36 interactions are significant. 


IMPLICATIONS 


In a full-scale test of Rokeach’s theory 
of belief prejudice with ninth-grade stu- 
dents, the present results point over- 
whelmingly to the validity of the theory. 
When information about a stimulus per- 
son's beliefs in the area of personal 
values is made available, perceived simi- 
larity—or dissimilarity—in beliefs is the 
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TABLE 15 
Summary OF THE 24 ANALYSES OF VARIANCE FOR RESPONSES TO THE ToTaL SocranL DISTANCE 
SCALES OF THE INTEREST AND ATTITUDE QUESTIONNAIRE WHicH Was GIVEN WHEN THE 
STUDENTS WERE IN THE 8TH GRADE 


Race Status Tue ^ | Status | Status 
Sample Form N P top. Ton 
la £d etu ei ger zm 
ance ance 
Negro males A |23| 5.31* .03 1.41 .00 1.55 2.31 .22 
Negro females A | 18 | 18.51**** | .17 -33 .00 3.66 1.26 E 
Protestant males A |23 | 19.01**** | .16 8.05*** | .02 6.70* 2.35 1.15 
Protestant females A | 26 | 40.41**** | .28 4.96* .00 5.54* .82 1.08 
Catholie males A | 27 | 29.87**** | .16 1.54 .00 3.79 2.38 4.26* 
Catholic females A | 30 | 29.18**** | .22 1.41 .00 5.97* 1.46 E 
Negro males T |18| 3.26 .01 6.43* .08 1.43 3.44 .15 
Negro females T | 20| 11.32*** | .09 5.64* 03 3.15 2.28 1.43 
Protestant males T | 15 | 26.17**** | .16 | 13.89*** | .14 1.40 2.48 .24 
Protestant females T | 16 | 24.35**** | .36 | 11.89*** | .03 8.66* 1.67 0.0 
Catholic males T | 28 | 26.37**** | .14 CR is .03 2.09 1.82 0.0 
Catholic females T |31 | 69.07**** | .31 2.30 .00 3.44 2.99 .01 
nan in} ree EL] 
Jewish males A | 52 | 39.26**** | .11 | 33.64**** | .09 2.42 2.53 0.0 
Jewish females A | 55 | 62.09**** | .20 | 16.34**** | .04 3.06 2.66 07 
Protestant males A | 23 | 12.04*** | .08 4.54* .01 3.42 1.86 .09 
Protestant females A | 27 | 14.89**** | .08 2.85 .01 2.78 2.85 .22 
Catholic males A | 27 | 28.99**** | .18 7.54*** | .02 2.06 .81 17 
Catholic females A | 32 | 22.26**** | .22 2.38 :00 | 24.39**** 1,82 4.35* 
Jewish males T | 74 | 37.71**** | .04 | 102.90**** | .30 2.54 6.95* 3.50 
Jewish females T | 73 | 28.36**** | .04 | 69.76**** | .22 2.37 5.34* .43 
Protestant males T | 15 | 16.58*** | .07 | 16.14*** | .21 83 2.64 2.46 
Protestant females T |16| 8.78*** | .10| 16.87**** | .14 3.82 2.47 3.55 
Catholic males T | 27 | 39,96**** | .23 | 15.78**** | .08 1.64 1.45 .97 
Catholic females 7P.|/32:|22.12**«* | 12 6.73* .03 4.05 4.14 2.13 
xar 
*** »« 0l. 
a ec 001 


primary determinant of attitudes of 
white gentiles toward Negroes and Jews. 
Likewise, knowledge of belief systems, 
when it is made available, is the most im- 
portant factor in Negro and Jewish stu- 
dents’ attitudes toward members of the 
majority. Only secondarily does racial or 
religious membership per se, or high versus 
low relative socioeconomic status, influ- 
ence the students’ feelings and action or 
orientations toward others under these cir- 
cumstances. 

The generality of the findings is impres- 
sive. These results hold up for both teen- 


age and adult forms of the questionnaire, 
for both sexes, and for Negro, Catholic, 
Protestant, and Jewish subjects as well as 
for ninth-grade students in California 
(Stein et al, 1965) questioned about 
Negro stimulus persons, in the absence of 
any substantial opportunity for interac- 
tion with Negroes. 

It is important to elaborate on these 
findings since they initially give the ap- 
pearance of opposing common notions of 
prejudice. In conventional accounts, so 
much emphasis has been placed on ethnic 
membership per se as a determinant of 
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prejudice toward members of minority 
groups that at first it seems incredible 
that belief incongruence could be the ma- 
jor determinant of prejudice. Yet these 
rather striking results can be reconciled 
with the practical importance of ethnic 
membership in social life. 

In the first place, the present study may 
confront gentile teenagers for the first time 
with information that tells them that there 
are Negroes who believe in many of the 
same things that they themselves do; 
that these Negroes hold many values that 
they themselves consider of vital impor- 
tance. We are asking students to make 
decisions about their feelings towards and 
willingness to interact with Negroes whom 
they have not before evaluated from this 
point of view. In a sense, it is like asking 
them, “If Negroes were more like you 
than you think they are and believed in 
the same things you do, would you then 
like them?” Our data give an affirmative 
answer to this question, particularly in 
regard to “feelings” toward Negroes, but 
the answer must be qualified by a second 
important feature of the data. 

The students do make a distinction be- 
tween situations that are relatively free 
from strong societal pressures, on the one 
hand, and ones that represent areas of 
interracial contact in which societal ta- 
boos are continually reinforced, on the 
other. From the data for the individual 
items of the social distance scale, it ap- 
pears that gentile subjects are willing to 
interact with like-valued Negroes in such 
situations as “sit next to in class,” “work 
on a committee with,” “have as a speak- 
ing acquaintance,” “eat lunch with,” and 
even “have as a member of one’s social 
group" or “have as a close personal 
friend." However, in situations of eul- 
turally defined intimacy or in which par- 
ents or other adults would be visibly in- 
volved in the contact, the subjects show 
great reluctance to interact with Negroes. 
Thus, on such items as “invite home to 
dinner,” “live in the same apartment, 
house,” “date brother,” “have a close rela- 
tive marry," and even “have as a neigh- 
bor on the street,” gentile subjects are 


much more prone to react in racial terms, 
frequently rejecting contact with Negroes. 

This finding appears to give partial sup- 
port to Triandi (1961) criticism of 
Rokeach’s theory. Triandis claimed that 
we object to having a Negro live next 
door to us because he is Negro, not be- 
cause of what he believes. Since this type 
of situation is one in which societal pres- 
sures discourage interracial contact, belief 
is a less powerful determinant than race 
for some subjects. 

Rokeach? suggests, however, that we 
still need not invoke an interpretation 
based on racial criteria for these kinds of 
situations. Instead, he claims that the 
principle of belief congruence is applicable 
but not in terms of what the Negro is 
seen to believe. In such a situation as 
having a Negro live next door, Rokeach 
suggests that the white person believes 
that the presence of Negroes in the neigh- 
borhood would affect the rise and fall of 
real estate values. This belief, therefore, 
could account for not desiring a Negro as a 
next door neighbor but could be quite in- 
dependent of the white person’s attitude 
toward the Negro. Rokeach offers an ad- 
ditional example: “Suppose a white person 
refused to sit down next to a Negro on a 
bus in Montgomery, Alabama. Is it be- 
cause that person is black, because of cer- 
tain beliefs that he sees the Negro to have, 
or because the white person believes that 
if he sits down the bus driver will ask him 
to get off the bus, or believes that he will 
be breaking a law for which he can be 
arrested?” The question may well be 
raised, however, whether a belief theory of 
prejudice thus extended is capable of 
empirical disconfirmation. The present 
study has found strong support for the 
more restricted version of the theory as 
the major though not exclusive determi- 
nant of interpersonal prejudice. 

Negro subjects, in responding to white 
stimulus persons, make few if any of the 
distinctions that whites do in the situations 
described on the social distance scale. In 
general, they would seem to have little to 
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lose and much to gain from social rela- 
tions with whites. 

The results for the religious compari- 
sons do not follow this general interpreta- 
tion. Although there are some subjects who 
balk at close personal contact with stimu- 
lus persons of another religion, in almost 
all cases difference in belief is the crucial 
factor in determining interreligious re- 
lations. There are fewer “taboo areas” for 
interreligious contact than for interracial 
contact. In the case of the former, only 
the items, “invite home to dinner,” and 
“have a close relative marry,” and possi- 
bly, “date brother,” seem to reflect so- 
cially strained areas of contact. 

The interpretation of the Rokeach and 
Triandis controversy offered by Stein et 
al, (1965) thus seems to be strengthened 
by the present findings. Knowledge of be- 
lief systems, if they reflect belief con- 
gruence, leads many more gentile subjects 
to evaluate their feelings and potential be- 
haviors toward Negroes in a positive man- 
ner. Without this knowledge, Negroes are 
assumed to have dissimilar beliefs and 
values and are consequently rejected. Even 
when information about belief systems is 
supplied, there are still some subjects who 
either feel bound by societal pressures or 
genuinely harbor hostile feelings toward 
Negroes and refuse to interact with them 
particularly in areas of intimate contact. 
It is not surprising, then, that when in- 
formation about beliefs is absent and only 
race and status are varied, racial consider- 
ations become dominant. 

Some cautionary remarks are in order, in 
view of the compelling consistency of the 
findings. This study ignores important con- 
ditions in social reality which might well 
mitigate against our findings. The theory 
of belief prejudice needs to be tested in 
conjunction with variations in the social 
forces which heavily influence the forma- 
tion and maintenance of prejudiced at- 
titudes (see Rokeach & Mezei, 1966, for a 
beginning in this direction). Moreover, the 
theory needs cross-validation in both 
southern and northern communities wherein 
racial strife is a common occurrence and 


where opportunities to perceive similarity 
of beliefs can be limited by conditions in 
the social structure. 

While the values of the factors of race 
and religious affiliation in this study are 
absolute (e.g. white, Negro, etc.), the value 
items representing the belief factor, and 
the status attributes, are arbitrarily set. 
Other possible values for these factors 
need to be sampled before we can know the 
limits to which the present findings can be 
generalized. 

Caution is also required in regard to 
possible effects of the relative salience 
with which information about race, reli- 
gion, and belief was presented in the ex- 
perimental materials. Race or religion was 
indicated by a mere word whereas the in- 
formation on belief required an entire 
page. One could also, certainly, have 
played down the importance of belief by 
using less relevant values or by reducing 
the amount of contrast between similar 
and dissimilar values. For that matter, 
the salience of race could have been in- 
creased by adding pictures of the stimu- 
lus persons. In addition, for these northern 
subjects, it is certainly "socially undesir- 
able" for them to stress race per se espe- 
cially considering the intellectualism of 
the method and setting. Moreover, Triandis 
and Davis (1965) were able to obtain 
powerful race effects with subjects classi- 
fied independently as “racially prejudiced” 
and with the use of social distance scale 
items reflecting negative behaviors such 
as “exclude from the neighborhood.” In a 
sense, then, the results are specific to the 
method used and need further checking to 
see how much they are tied to the method. 
To show that one can pick evaluative be- 
liefs, however, that so predominate over 
race is essentially to support Rokeach’s 
theory, even though there are other be- 
liefs for which the prediction might not 
hold. Knowledge of the perceived similar- 
ity of belief systems is clearly a crucial 
factor in the understanding of prejudice. 
Many possible strategies for the solution 
of racial and ethnic tensions follow from 
this fact. 
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Multiple studies were pursued of the hypothesis that the manner in which 
an individual distributes attention to his body is linked with his traits and 
personality attributes. Attitudes involving the following body awareness 
dimensions were measured: right versus left, front versus back, eyes, total 
body, and heart. A variety of personality parameters were sampled by 
means of questionnaires, selective memory responses, and semiprojective 
reactions to pictures. Conflictual feelings about certain wishes and aims 
were also evaluated from responses to stimuli presented in the Ames There- 
ness-Thatness apparatus, College students and psychiatric patients consti- 
tuted the samples studied. It was possible to demonstrate that heightened 
awareness of specific body sectors is accompanied by characteristic con- 


flicts and modes of defense. 


T has been proposed that an individual's 
body experiences reflect the nature of his 
personality defenses. Freud's (1924, 1938) 
descriptions of the oral and anal char- 
acter types depict an explicit relationship 
between investment of energy in certain 
body sectors and the existence of specific 
conflicts and defenses. Related equations 
between body feelings and personality 
patterns have been proposed by Schilder 
(1935), Reich (1949), Alexander (1948), 
and others (Ferenczi, 1916; Fenichel, 1945). 
In almost all of these systems it is assumed 
that personality variables are correlated 
with attitudes toward particular body sec- 
tors as a function of one or both of the fol- 
lowing considerations: á 
1. It is suggested that in the course of 
socialization the child acquires certain re- 
sponse patterns (e.g. traits) because of 
crucial experiences he has with his parents. 
These experiences often revolve about body 
functions linked with specific areas of his 
body and result in his placing special 
valuations upon these areas. Thus, his 
Style. of sexual behavior might be in- 
fluenced by the orientation he adopts from 
his parents toward the sexual regions of 
his body (e.g., lower half of body). Con- 
sequently, there would be a correlation 
between his attitudes toward sexual ex- 
*This study was partially supported by United 
States Public Health Grant M-5178 and also Na- 
tional Science Foundation Grant GP-1137. 


pression and his attitudes toward the sec- 
tors of his body having sexual functions, 

2, Another consideration which has been 
noted is the unique closeness of the in- 
dividual’s body to himself as a perceiver. 
His body is the only object in his percep- 
tual field which he simultaneously per- 
ceives and is also a part of himself, Its 
special closeness to himself (ego, identity) 
maximizes the likelihood that it will re- 
flect and share in his most important 
preoccupations. Like all ego significant 
objects it can become a convenient “screen” 
upon which are projected one’s most sa- 
lient concerns. An example of such pro- 
jection would be provided in the case of 
the individual who feels unimportant and 
inferior and then presumably transfers this 
view to his body by perceiving it as 
smaller than it is. Indeed, Popper (1957) 
and also Wapner and Krus (1959) have 
shown that failure experiences result in 
subjects perceiving themselves as relatively 
shorter in stature. 

The discovery of relationships between 
personality variables and body attitudes 
would open many possibilities. It would 
help to clarify the role of body experience 
in personality processes. Also, it would 
permit a new approach to evaluating per- 
sonality variables based not on their di- 
rect measurement but rather in terms of 
their body attitude representations. In- 
formation about relationships between body 


2 SEYMOUR FISHER 


attitudes and personality variables is only 
beginning to become discernible. By way 
of resumé, the following studies may be 
mentioned. One series of projects has es- 
tablished that an individuals mode of 
experiencing the boundary regions of his 
body (viz. skin and muscle) is linked 
with traits relating to self-assertion, self- 
expression, and mastery (Fisher, 1963; 
Fisher & Cleveland, 1958). Some tentative 
findings are available concerning the size 
one ascribes to one's body and the degree 
to which one is aggressive (Fisher, 1964b) 
and also field dependent (Epstein, 1957). 
Dissatisfaetion with one's body has been 
shown to be related to feelings of in- 
security (Jourard & Secord, 1955). There 
are also reports which variously: tie in 
attitudes toward the right and left sides 
of one's body with one's degree of hos- 
tility toward the opposite sex (Fisher, 
1965a); relate body awareness to narcis- 
sism (Secord, 1953); and demonstrate a 
relationship between "strength of body 
image" and “effective control of the pri- 
mary process [Reiff, 1962]." These studies 
range widely and revolve about diverse 
measuring procedures. With few exceptions 
they deal not with attitudes toward specific 
regions of the body but rather with broad, 
abstract body-image dimensions. 

The present project was concerned with 
pursuing the personality correlates of a 
series of body-image parameters based on 
a common rationale. Of central interest 
was the question whether differences in 
the relative distribution of an individual's 
attention to various parts of his body or to 
his body as contrasted to nonbody objects 
are accompanied by corresponding person- 
ality differences. Do body-attention pat- 
terns provide meaningful information about 
the personality structure? Several different 
approaches to this issue were undertaken. 


RIGHT-LEFT 


The first approach was concerned with 
the distribution of attention to the right 
versus left body sides. It was anticipated 
that the relative prominence of the right 
and left sides in an individual's body 
Scheme (as defined by focus of attention) 


would be correlated with indexes of sex- 
ual adjustment and sexual identification, 
"There were several reasons for anticipating 
such correlates. One finds a long history 
of anecdotal and clinical observation 
(Fisher, 1965a) suggesting that the right- 
left dimension relates to matters of mas- 
culinity and femininity. It has been assumed 
that the right is symbolic of masculinity 
and strength and the left of femininity and 
weakness. While the few studies dealing 
with the right-left distinction have not 
supported this particular formulation, they 
indicate that right-left is pertinent to sex 
role issues. Fisher (1965a) reported that 
differences in number of male and fe- 
male names applied to puppets placed 
upon the right and left hands were re- 
lated to attitudes regarding the relative 
superiority of men to women. Similarly, 
the ability of subjects to make a clear 
distinetion in the apparent sizes of their 
right and left hands while wearing anisei- 
konie lenses has proven to be correlated 
with projective indexes tapping sex role 
variables (Fisher, 1960b). It is of interest 
too that boys perceive autokinetic move- 
ment as more right-directional than do 
girls (Fisher, 1962). 


Study 1A 


In first approaching the possible cor- 
relates of the distribution of attention to 
the right and left body sides, an exploratory 
study was undertaken which involved re- 
lating a right-left attention index to a 
gross measure of the individual's sexual 
orientation, that is, his level of hetero- 
sexual interest. Two questions were under 
consideration: (a) Is the degree of right 
versus left attention correlated with hetero- 
sexual interest? (b) If so, what variations 
in such interest accompany greater focus 
on the right as compared to the left, sides? 


_ Procedure. Relative direction of attention to the 
right and left body sides was evaluated by means 
of the Body Focus Questionnaire. This is an instru- 
ment (Fisher, 1964b) which involves the subject 
comparing his degree of awareness of a series of 
different sites on his body. It presents him with & 
list of 91 paired references to body sectors (e. 
right hand versus left hand, left leg versus right 
leg, stomach versus arm, arm versus neck). He is 
asked to turn his attention upon his body and to 


Bony Arrention PATTERNS 3 


indicate for each pair of body parts which he is 
"most; conscious of or aware of right now." Nine 
of the comparisons involve a right-side sector ver- 
sus a left-side sector. The remaining items can be 
scored for other body dimensions, but in the pres- 
ent study served as filler to conceal the fact that 
the measurement process was concerned with the 
right-left dimension. A subject's relative degree of 
focus upon the right side of his body could range 
from 0 through 9. Administration of the Body Fo- 
cus Questionnaire took place in a group setting. 

The Edwards Personal Preference Schedule was 
administered to obtain responses to the Hetero- 
sexuality scale which inquires concerning the de- 
gree to which the subject considers heterosexual ac- 
tivities (e.g., to be in love, to kiss the opposite sex) 
as characteristic of himself. 

Subjects. The subjects were 51 male college 
students who were paid a fee for participating. 
Their median age was 20. They were all right- 
handed in order to eliminate the possible effects of 
handedness upon the distribution of right-left at- 
tention. The study was restricted to males because 
data from other sources already cited (Fisher, 
1962) clearly suggest that sex differences are to be 
expected with regard to the significance of right- 
left. 


Results, The mean Body Focus Ques- 
tionnaire Right score was 5.5 (c = 2.3). 
The mean Edwards Heterosexuality per- 
centile score was 53.6 (¢ = 30.1). There 
proved to be a product-moment correlation 
of —.27 (p = .05) between the Right 
score and the Heterosexuality score. Thus, 
the greater the attention an individual 
focuses on the right side of his body 
the less is his apparent heterosexual ori- 
entation as defined by the Heterosexual 
scale. 

Discussion. One sees initial support for 
the view that the right-left dimension of 
the body image reflects aspects of the 
individual's sex role and sexual adjust- 
ment. It is an intriguing question as to why 
focus of attention on the right rather than 
the left side should denote a reduced level 
of heterosexual response. Discussion of this 
issue will be postponed until a later point. 


Study 1B 


An attempt was made to generalize from 
the initial findings by formulating the 
following hypotheses which specify dis- 
turbance in various levels of one’s sexual 
behavior as correlated with increasing em- 
phasis upon the right side of the body: 


The greater the focus of a man’s at- 
tention upon the right side of his body: 

1. The less active will be his general 
level of heterosexual behavior. 

2. The more defensive he will be when 
confronted with stimuli that arouse anx- 
iety about sexual identity. 

3. The more anxious he will be in re- 
sponding to symbolic representations of 
sexual problems and threats. 


Procedure. The Body Focus Questionnaire 
(BFQ) was used to determine the subject’s right- 
left distribution of attention. To increase the re- 
liability of the measure it was administered on two 
separate occasions, with an average of 7 days in- 
tervening. A total Right score was derived equal 
to the sum of the right-side sites chosen on the two 
different occasions. The product-moment correla- 
tion between the test and retest Right scores was 
58 

Multiple procedures were employed to get at 
the sexual role variables referred to in the hypothe- 
ses. 

Heterosexual activity level was appraised di- 
rectly by means of a questionnaire inquiring con- 
cerning the subject’s present and past sexual inter- 
action with girls, He was asked to indicate the age 
at which he began dating; the average number of 
dates he had during each of the 4 years of high 
school and also currently ; and the number of times 
he had “gone steady” or been engaged. The fol- 
lowing scores were derived from this information: 

1. Age at which began dating. 

2. Average of number of dates per week for the 
4 years of high school. 

3. Average number of dates per week in cur- 
rent life. 

4. A “serious dating” index equal to the num- 
ber of times has “gone steady” (maximum of 2) 
plus a credit of 1 for currently going steady plus 
a credit of 2 for being currently engaged. 

Another procedure was used to determine the 
subject’s reactions to stimuli intended to arouse 
anxiety about sexual identity. The stimuli were 15 
pictures of human figures from a series developed 
by Doidge and Holtzman (1960) to study homo- 
sexuals. The figures were taken from drawings, 
paintings, and statues photographically reproduced 
so as to make the sex ambiguous. The 15 selected 
for the present study were maximally ambiguous 
in this respect. It was presumed that the greater 
an individual's uncertainty about his sexual iden- 
tity the more disturbing he would find such pic- 
tures and therefore the more defensive negative af- 
fect they would arouse in him (e.g., as suggested by 
Murray [1938]). With this rationale, a procedure 
was devised to involve the subject judgmentally 
with each of the pictured figures and to record his 
reactions via ratings. 

Each picture was projected in a semidarkened 
room for 15 seconds. The subject was asked to de- 


TABLE 1 
Means AND STANDARD Deviations ror BFQ 
Ricur Scores, INDEXES or HETEROSEXUAL 
BEHAVIOR, VAGUE Sex Picture RATINGS, 
Buacky PICTURE RANKS, AND SEXUAL 
REFERENCE SCORES 


Variable Mean o N 

BFQ right 10.5 4.1 498 
Heterosexual behavior 

Age began dating 14.7 17 50 

Dates in high school 8 aD OA 

Current dates Tr s 1581149 

Serious dating score 2.9]01:7.. 753 
Vague sex pictures 

Ugly ratings 44.0 5.5 48 

Unfriendly ratings 43.7 47 48 

Unfriendly plus ugly 87.9 85 48 
Blacky pictures 

Oedipal intensity 5-1: 1205 fF 

Castration anxiety LA PERPE 227. 51 

Love object 42 2.8 651 
Sexual references 1,8... 1.8. 49 


a Variations in N are a function of subjects 
either missing certain tests or not answering spe- 
cific questions in a given test. 


cide whether the figure was male or female and to 
indieate on a 5-point scale its apparent degree of 
masculinity-femininity. These judgments were ob- 
tained to insure there would be direct confronta- 
tion with the threatening, poorly defined sexual 
attributes of each figure. To measure the amount 
of defensive negative affect aroused, two other 
ratings of each were obtained. One was an evalua- 
tion of the attractiveness of the figures on a 5- 
point scale ranging from “ugly” to “good looking” 
and the other was a rating of the friendliness of the 
figures on a 5-point scale ranging from “friendly” 
to “unfriendly.” From the ratings three indexes of 
defensive negative response was computed: 


TABLE 2 
Propuct-Momen? CORRELATIONS OF BFQ 
Rica Scores WITH HETEROSEXUAL 
Activity INDEXES 


BFQ Right versus r Ert 

Age began dating .30 «.05 
(N — 50) 

Average number dates —.21 2.10 
per week in high school (N = 52) 

Average number of dates —.39 «.01 
per week currently (N = 49) 

Score for serious dating — .29 «.05 
(N = 51) 
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1. Sum of the 15 ratings of degree of ugliness. 

2. Sum of the 15 ratings of degree of unfriend- 
liness. 

3. A total score equal to the sum of the ugliness 
and unfriendliness ratings. 

A third technique was devised to tap anxiety 
about sexual issues and conflicts in terms of re- 
sponses to the Blacky Pictures (Blum, 1949), which 
consist of 11 scenes in which a dog is portrayed as 
engaged in activities illustrating crucial develop- 
mental problems in the psychoanalytic scheme 
(e.g., Oral Eroticism, Castration Anxiety). A new 
approach to measuring response to the pictures was 
attempted. The subject is presented with the 11 
pietures arranged in the order in which they are 
usually administered. He is told that in a later ses- 
sion he will be asked to compose stories about 
some of them. In preparation for this, he is to ex- 
amine the pictures and put them in rank order, 
with the one he would most prefer to elaborate 
upon first in sequence and the one he would least; 
prefer to describe last in sequence. It was assumed 
the greater the anxiety aroused in a subject by a 
picture the more motivated he would be to evade 
involvement with it by assigning it a low rank or- 
der. Only the reactions to three pictures specifically 
related to heterosexual issues were considered per- 
tinent for the present study. The three pictures 
were as follows: Oedipal Intensity, Castration 
Anxiety, and Love Object. It was anticipated that 
the greater the attention devoted to the right side 
of the body the lower would be the rank order as- 
signed to these pictures. 

Another procedure was used to determine how 
easily the individual verbalizes sexual thoughts in 
a free expressive situation. It was presumed that 
the greater his anxiety about sexual matters the 
more difficult it would be for him to make sexual 
references, The opportunity for free expression of 
thoughts and images was provided by asking him 
to list on a sheet of paper “20 things you are con- 
scious of or aware of right now.” Responses were 
obtained on two occasions, with a week interven- 
ing. Sexual references were defined to include only 
direct statements about heterosexual interests or 
activities (e.g., “I would like to kiss a girl,” “I am 
going on a date tonight”). Two judges achieved 
91% agreement in their scoring of 40 protocols. 
The Sex Reference score equaled the sum of sexual 
references in the two samples of responses and 
could range from 0 to 40. It was anticipated that 
BFQ Right would be negatively related to the Sex 
Reference score. 

A new sample of subjects was used consisting of 
49 male college students with a median age of 20. 
The BFQ Right score was determined by means 
of a revised scale with 15 items instead of 9. 


Results. The BFQ Right score proved 
to be correlated in the predicted direction 
with the subjects’ reports of heterosexually 
motivated behavior. As shown in Table 2, 
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TABLE 3 
Propuct-Moment ConRELATIONS or BFQ Rrest 
SCORES WITH INDICATORS OF NEGATIVE 
RESPONSE TO VacuE Sex PICTURES 


BFO Right versus r paineis 
Sum of ugly ratings -30 <.05 
(N = 48) 
Sum of unfriendly -30 <.05 
ratings (N = 48) 
Unfriendly and ugly .36 .01 
(N = 48) 


the higher his BFQ score the later the age 
at which he began to date girls (r = .30, 
p < .05); the less his current amount of 
dating per week (r = —.39, p < .01); and 
the lower his score for serious dating 
as defined by “going steady” and being 
engaged (r = —.29, p < .05). There was 
also a correlation of —.21 between BFQ 
Right and amount of dating in high school 
which is in the predicted direction, but 
not significant (p > .10). 

When one examines the relationships 
between BFQ Right and the Vague Sex 
Picture rating, it is apparent that they are 


TABLE 4 


Pnopvcr-MowENT ConnELATIONS or BFQ Ricut 
Scores wrrH BnAcky PrcrunE RANK SCORES 
AND Sex REFERENCE SCORES 


BFQ Right versus r Si 
Blacky 
Oedipal intensity .05 n.s 
(N = 51) 
Castration anxiety .22 >.10 
(N = 61) 
Love object —.13 n.s. 
(N = 51) 
Sex reference 
Set 1 —.30 «.05 
(N = 50) 
Set 2 — 38 <.01 
(N = 50) 
Sum —.39 <.01 
(N = 50) 


supportive of the hypothesis under con- 
sideration. Table 3 indicates that each of 
the Vague Sex Picture ratings (Ugly and 
Unfriendly) is correlated .30 with BFQ 
Right (p < .05) and that the combined 
Ugly and Unfriendly ratings attain a cor- 
relation of .36 (p = .01) with BFQ Right. 

The results shown in Table 4 indicate 
that BFQ Right has a chance relationship 
with the Blacky Pictures. Only the cor- 
relation between BFQ Right and Castration 
Anxiety approaches significance in the pre- 
dicted direction (r = .22, p > .10). 

Correlations between BFQ Right and the 
Sex Reference scores were significant in 
the predicted direction. BFQ Right had a 
correlation of —.30 (p « .05) with Sex 
References in the first set of responses; 
—.38 (p < .01) with Sex References in 
the second set of responses; and —.39 (p 
< .01) with the sum of Sex References 
in both sets. 

Discussion. The findings support the 
proposition that the greater a man’s focus 
of attention upon the right as contrasted to 
the left side of his body the more likely he 
is to be characterized by inhibition in his 
heterosexual behavior, anxiety about his 
sexual role, and difficulty in expressing 
ideas with sexual reference. It is true that 
the Blacky Pictures data were of a chance 
order and therefore not congruent with the 
original hypotheses. Apparently, the sex- 
role diffieulties revealed in most of the 
procedures employed are not detected by 
the Blacky Pictures. : 

A prime question raised by the findings 
is why heterosexual diffieulties are associ- 
ated with a focus of attention on the right 
as opposed to the left body side. Two 
possible explanations will be offered. One 
derives from observations concerning the 
differential response characteristics of the 
right and left sides. The response of the 
right side tends to be slower and more con- 
trolled than that of the left. Schoen and 
Scofield (1935) reported that when the 
eyes of the right-eyed person are shifting 
from one fixation point to a new target, 
the left eye responds first, “snapping” to 
its new position and sometimes overshoot- 
ing, as compared to the right eye, which 
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moves more gradually and precisely. Sim- 
ilarly, Travis and Herren (1929) and Jas- 
per (1937) noted that when right-handed 
individuals were asked to perform a task 
quiekly and simultaneously with both arms, 
the left responded first. Such findings sug- 
gest that in right-handed persons the right 
side is characterized by a stable set which 
facilitates control but also inhibits the 
spontaneity of that side as compared to the 
nondominant side. Jasper (1937, p. 161) 
specifically stated “...the tendency for 
the nondominant side to lead in attempted 
simultaneous movement may indicate a 
greater cortical control (‘inhibition’) of 
the movements of the so-called dominant 
side which is only a counterpart of the 
more highly perfected coordination of 
movement on this side.” If so, the right 
side could become associated with control; 
whereas the left would betoken spon- 
taneity. Thus, the relationship between 
poor heterosexual adjustment and focus 
on the right could be construed as indicat- 
ing that those having difficulties in hetero- 
sexual expression are also those who 
"ignore" the spontaneous side of their 
bodies and concentrate on the “controlled” 
side. To attend to the right could rep- 
resent a set to respond in a careful, self- 
controlled fashion, and such a set might 
be antithetical to the spontaneity required 
for adequate heterosexuality. 

A second possible explanation for the re- 
lationship between focus on the right and 
heterosexual role relates to the fact that 
the right-handed individual is aware that 
his right hand is stronger than his left, 
He might, therefore, associate the right 
hand with strength and power which are 
attributes that typically define mascu- 
linity. But the left hand would be for him 
the “weaker one” and in that sense less 
masculine or more feminine. If one were 
doubtful about his masculine adequacy, 
he might express such concern in an anxious 
awareness of his right side which he equates 
with the strength needed to be masculine. 
He could be thought of as anxiously watch- 
ing his right side because he anticipates 
it will not function to provide the power 
he feels he needs to be manly. 


Front-Back 

Another major body-image dimension 
which was studied relates to the differ- 
entiation between the front and the back 
of the body. It has been widely considered 
psychoanalytically, but little experimen- 
tally. Freud (1924, 1938), Abraham (1927), 
Ferenczi (1955), Fenichel (1945), Tausk 
(1933), and Schilder (1935) theorized 
that the back of one's body is largely 
associated with anal functions. Of course, 
Freud originated the idea. He developed 
the concept of an "anal personality" who 
is unconsciously preoccupied with anal 
sensations (linked with the back of the 
body) as the result of conflicts experienced 
during the period of childhood when con- 
trol of the anal sphincter is learned. He pro- 
posed that the conflicts faced by the child 
during the anal period center on issues of 
obedience and passivity versus opposition 
and self assertion. At a more elementary 
level, they presumably relate to control 
versus lack of control of a body function 
which is considered to be dirty and socially 
unacceptable. The “anal personality” is por- 
trayed as having great anxiety about the 
potential loss of control of his anal sphincter 
and the associated implications of disobedi- 
ence and soiling aggression. He is therefore 
said to be defensively strict with himself 
about being spontaneous or impulsive. Also, 
he is defensively clean, orderly, and obedi- 
ent. But it is theorized that while he exer- 
cises such restraint over himself he has an 
underlying resentment about being con- 
trolled which permeates his behavior in the 
form of negativism and stubbornness. 

There have been efforts to study the 
“anal character” concept empirically. Ques- 
tionnaires and projective tests have been 
used to evaluate the meaningfulness of 
“anality” as a trait. Beloff (1957), Barnes 
(1952), Krout and Tabin (1954), and 
Couch and Keniston (1960) have shown 
that questionnaire items presumably sam- 
pling anal attitudes can be formulated 
which are coherent statistically and also 
in relation to the Freudian “anal stage” 
model. Blum (1949) and Miller and Stine 
(1951) have findings indicating that pro- 
jective responses to pictures and story 
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completions can be reliably analysed for 
anal themes. 

The present study conjectured that the 
greater an individual’s awareness of the 
back of his body the more he is concerned 
with anal sensations and therefore typical 
of the “anal character.” Thus, it was hy- 
pothesized: 

1. The more attention a man devotes to 
his back as compared to the front of his 
body the greater is his anxiety about im- 
pulses “spilling out” and the less does he 
manifest spontaneous, impulsive behavior. 

2. The greater his back awareness the 
stronger is his tendency to avoid direct 
aggressive expression and instead to make 
use of negativism. 

3. His level of anxiety when confronted 
with stimuli that refer to anal functioning 
is positively correlated with his degree of 
back focus. 

4. The more he recalls his parents as 
providing a model of behavior minimizing 
the direct expression of aggressive impulses, 
the more intense is his back awareness. 

This last hypothesis follows from the 
assumption that a source of the “anal 
character’s” anxiety about his aggressive 
potentialities is his perception of his par- 
ents as disapproving of aggression by re- 
stricting its appearance in their own be- 
havior and that of other family members. 

5. A fifth hypothesis was derived from 
work by Miller and Stine (1951) in which 
it was observed that children whose fan- 
tasies were typified by anal themes were, 
in terms of sociometric criteria, unusually 
popular with their peers. Miller and Stine 
speculated that the controlled traits of the 
“anal character” might impress others as a 
sign of being steadfast and orderly and 
elicit favorable evaluations. Relatedly, 
Couch and Keniston (1960) concluded 
from their data that for the “anal re- 
tentive": “The necessity of friction and 
aggressiveness in competitive situations is 
strongly denied, and replaced consciously 
by reactive trust and tolerance for others 
[p. 172].” Thus, from two different per- 
spectives there seemed to be evidence that 
the anal character is proficient in pleasing 
rather than antagonizing others in group 


settings. It was therefore hypothesized that 
degree of back awareness would be posi- 
tively correlated with the individual’s in- 
terest in group participation and also his 
popularity in such group situations. 


Study 2A 


The hypothesis predicting an inverse 
relation between degree of focus on one’s 
back and behavioral spontaneity was the 
first to be tested. It should be noted that 
Couch and Keniston (1960) have shown 
that among a cluster of traits characteriz- 
ing the “retentive anal character” self- 
control and restraints upon impulsive ex- 
pression are the most prominent. 


Procedure. The intensity of a subject’s attention 
to his back was measured with the Body Focus 
Questionnaire (BFQ). Embedded in the BFQ form 
were references to six? paired front-back body sites 
(e.g, front of head versus back of head, front of 
neck versus back of neck), and the subject indi- 
cated in each case whether he was more aware of 
the front or the back site. His score could range 
from 0 through 6. It has been shown in a group of 
52 male subjects that there is a correlation of .44 
(p < .01) between test-retest BFQ Back scores 
secured with an intervening period of 1 week. 

The Impulsive scale, one of seven contained in 
the Thurstone Temperament Schedule, was chosen 
to ascertain how much spontaneity typified the 
subject. Thurstone (1953) states, “High scores in 
this category indicate a happy-go-lucky, daredevil, 
carefree, acting-on-the-spur-of-the-moment disposi- 
tion [p. 1].” The Impulse score is based on the sub- 
ject’s reports concerning his own behavior, as in- 
dicated by responding Yes, No, or ? to a series of 
statements, 

In dealing with the spontancity variable, an un- 
published Anal Orderliness scale developed by 
Henry Murray was also administered to one sam- 
ple. This scale contains 10 items which inquire con- 
cerning compulsive and perfectionistic behavior 
(e.g., “I do things more slowly and carefully than 
others"; "I am generally methodical and syste- 
matie in the way I go about things"). The subject 
indicates his degree of agreement on a 5-point 
scale. 

Subjects. Three different samples of subjects 
were studied with the Thurstone scale. They con- 
sisted, respectively, of 40, 51, and 60 male college 
students. The median age in each of the groups 
was 20. A fourth sample of 52 students (median 
age 20) was studied with the Murray Anal Order- 
liness Scale. 

? The limited number of back items is due to 
the limited number of homologous front-back sites 
on the body which can be clearly defined verbally 
for subjects. 


Résults. The BFQ Back median in 
Sample 1 was 2 (range 0 through 6). The 
Impulse median was 9 (range 5 through 
17). When the triehotomized Back scores 
were related to the dichotomized (at the 
median) Impulse scores by means of chi- 
square, it was found that they were neg- 
atively linked at a borderline level (3? — 
4.6, df — 2, p= 10). 

In Sample 2 the mean BFQ Back score 
was 3.1 (c — 1.8). For the Impulse scores 
the mean was 10.6 (c — 2.7). A product- 
moment correlation of —.24 was found be- 
tween the Back and Impulsive scores. With 
a one-tailed test, which was used because 
this was a cross-validation attempt, the 
coefficient is significant at the .05 level. 

In Sample 3 the mean Back score was 
2.5 (c = 1.7). The mean Impulse score was 
10.6 (c = 3.5). A significant negative cor- 
relation of —.26 (p « .05, one-tailed test) 
was found between the two sets of scores. 

The results for Sample 4, in which the 
Murray Anal Orderliness scale was em- 
ployed, indicated that orderliness was 
significantly and positively correlated with 
BFQ Back (r = .36, p < .01, two-tail test), 
as predicted. The mean Orderliness score 
was 16.2 (o = 5.6). 

Discussion. The data from the four 
samples go along with the expectation that 
the greater a man’s awareness of his back 
the more likely he is to avoid responses 
which are not carefully controlled. None 
of the.individual relationships are large but 
their eonsistent direetionality over four 
samples is encouraging. Since anxiety about 
loss of impulse control is a prominent dif- 
fieulty aseribed to the “anal personality,” 
the above finding adds weight to the notion 
that attention to one's back and "anality" 
have overlapping significance. 


Study 2B 


Other studies were undertaken to de- 
termine whether the relationships of BFQ 
Back to several other anal trait variables 
were in the predicted direction. 


Procedure. A second hypothesis stated that back 
awareness would be negatively correlated with 
open aggressiveness and positively so with stub- 
bornness or negativism. Two subscales of the Buss- 
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Durkee Inventory in Buss (1961) were employed 
to get at the anger variables: (a) A 10-item Ag- 
gression scale which is typified by assertions like 
“Whoever insults me or my family is asking for 
it,” “If I have to resort to physical violence to de- 
fend my right, I will.” The subject responds by 
answering Yes or No to each item. (b) A 5-item 
Negativism scale which is represented by a state- 
ment like “When someone is bossy, I do the oppo- 
site of what he asks.” 

Back awareness was measured in this case and in 
relation to the other hypotheses that follow by 
means of the BFQ. 

The hypothesis that back awareness would be 
positively correlated with anxiety about stimuli 
with anal significance was explored via responses 
to the Blacky Pictures. One Blacky picture is la- 
beled Anal Sadism and depicts the dog named 
Blacky in a position where his anus is visible and 
it is apparent that he has just defecated. Response 
to this picture was measured with the same proce- 
dure that was used in determining response to the 
pictures with sexual content in the studies con- 
cerned with BFQ Right. 

Another hypothesis had predicted that BFQ 
Back would be negatively correlated with the sub- 
ject’s perception of how openly his parents ex- 
pressed anger. He indicated on a 3-point scale the 
degree to which each of 10 statements concerned 
with behavior expressive of anger applied first to 
his mother and then to his father. Examples of the 
statements follow: Likes a fight; Expresses anger 
openly and directly; Good at telling people off. 
The responses “Not at all true”; “Slightly true”; 
“Very true" were weighted, respectively, 0, 1, 2. 
"Total scores could range from 0 through 20. 

Subjects. The subjects were 55 male college stu- 
dents recruited by payment of a fee. Their median 
age was 20. 


Results. The BFQ Back score mean for 
the subjects who completed the Buss- 
Durkee scales was 2.4 (c = 1.6). The 
mean for the Buss-Durkee Aggression scale 
was 5.2 (c = 2.4); and for the Negativism 
scale it was 1.9 (c = 1.4). One can see in 
Table 5 that the BFQ Back has chance 
relation to Aggression, but that it is signifi- 
cantly correlated with Negativism in the 
predicted direction (r = .27, p < .05). Ap- 
parently, back awareness is not correlated 
with self-reports of overt aggression, but it 
is positively so with such reports of nega- 
tivistic behavior. 

The Blacky Pictures data were support- 
ive of the proposition that back aware- 
ness is positively correlated with the level 
of anxiety aroused by representations of 
anal function. Table 5 shows that BFQ 
Back is correlated .26 (p < .10) with the 
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TABLE 5 


Pnaopvcr-MowENT CORRELATIONS Or BFQ Back 
WITH “ANAL CHARACTER” VARIABLES 


BFQ Back versus r PI. 
Buss-Durkee aggression —.13 n.s. 
(N = 52). 

Buss-Durkee negativism 27 «.05 
(N = 52) 

Blacky anal sadism .26 <.10 
(N = 51) 

Father anger — .38 <.01 
(N = 50) 

Mother anger .06 n.s. 
(N = 52) 

Total affiliation with .21 2.10 
organizations (N = 55) 

Total elective offices 27 <.05 


held (N = 55) 


a N varies because the subjects gave incom- 
plete responses to some procedures or else mis- 
understood the instructions. In the case of paren- 
tal ratings there were instances in which a father 
or mother had died when the subject was still a 
young child and therefore could not be recalled. 

^ Does not even attain .20 level. 


rank-order placement of the Anal Sadism 
pieture. The higher the BFQ Back score 
the less willing was the subject to compose 
a story about the Anal Sadism picture. 
This borderline relationship was examined 
by means of chi-square in which the 
dichotomized (at median) Back scores were 
related to the trichotomized (equal thirds 
as possible) Blacky scores. A significant 
chi-square of 6.0 (p = .05, df = 2) was 
found. 

One notes that the mean anger score 
attributed to mother was 7.9 (¢ = 3.7) 
and to father 5.6 (s = 2.6). Table 5 in- 
dicates that BFQ Back was, as predicted, 
negatively correlated with Father Anger 
(r = —38, p < 01), It had only a 
chance correlation with Mother Anger. 
The hypothesis was supported in terms of 
father’s recalled traits but not in terms 
of mother’s. 


Procedure. The fifth hypothesis proposed that 
BFQ Back would be positively correlated with de- 


gree of participation in group activities and also 
one’s popularity in such groups. To obtain an index 
of the subject’s amount of participation in groups, 
he was asked to list the organizations to which he 
had belonged in high school and his first year in 
college. His popularity in these groups was esti- 
mated by asking him to list the elective offices he 
had held in each. 

The BFQ score in this study was based on the 
enlarged number of 19 items. 

Subjects. The subjects were 55 male college stu- 
dents whose median age was 20. 


Results. The data dealing with the rela- 
tion of BFQ Back to group participation 
are mildly favorable to the proposed hy- 
pothesis. There is a trend for the predicted 
positive correlation between BFQ Back 
and Total Affiliation with Organizations, 
although it is not significant (r = .21, p 
> .10). Further, the relationship between 
BFQ Back and Total Elective Offices Held 
is significantly positive, as predicted (r = 
.27, p < .05). One can say that those with 
relatively greater back awareness are those 
who most frequently report election to office 
in the organizations to which they belonged. 

Discussion. How has the concept of a 
link between back awareness and “anal 
character” traits fared? The results are 
encouraging. The most pinpointed evidence 
of anal involvement in back awareness is 
offered by the fact that BFQ Back is 
positively correlated with the subject’s re- 
luctance to deal with the Blacky Anal 
Sadism picture. Especially encouraging, 
too, are the Thurstone Impulse scale data 
and the Murray Anal Orderliness scale 
findings which indicate that the individ- 
ual who focuses attention on his back is 
also one who restricts impulse expression 
and behaves with compulsive care. The 
related assumption that such an individ- 
ual would also avoid direct expression of 
aggression and rely instead on indirect 
forms of stubbornness was only partially 
confirmed. BFQ Back turned out not to 
be correlated with Buss-Durkee Aggression, 
but positively so with Negativism. There 
was partial confirmation of the hypothesis 
that back awareness is negatively corre- 
lated with the degree to which one's par- 
ents are recalled as openly showing anger. 
The confirmation was only partial because, 
while the data involving recall of the 
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father’s behavior were congruent with ex- 
pectation, those pertaining to mother were 
not. It is possible that the mother's style 
of anger expression is less important than 
father's in providing a model for a son. 

The most tangential prediction made 
assumed that an individual’s degree of 
back focus would be positively linked with 
how involved he was in group activities 
and also how popular he was in such 
groups. The results affirmed the expectation 
about back awareness and popularity in 
organized groups but indicated only a non- 
significant trend in the predicted direction 
for group participation. The significant find- 
ing should be cautiously interpreted because 
subjects’ reports of their own group be- 
havior were used rather than more objec- 
tive observations. However, it is also true 
that the significant finding is supported by 
the work of Miller and Stine (1951) in 
which preoccupation with anal themes in 
children was found to be related to their 
group popularity. The anal orientation pre- 
sumably basic to back awareness does seem 
to make for popularity with one’s peers. 
This could be regarded as a function of 
modulated self-control or “reactive trust 
and tolerance for others [Couch & Keniston, 
1960]” which might have a pleasing con- 
ciliatory effect. One would have to guess too 
that the negativism usually ascribed to an 
anal orientation is not prominent in peer in- 
teractions. Perhaps such negativism is more 
common in encounters with authority 
figures. 

Paranoid defense. Freud (1950) and 
other analytic theorists (e.g, Ferenczi, 
1916) underscored the importance of “anal 
fixation” in the formation of the paranoid 
delusion. It was considered that the para- 
noid delusion represents an attempt to 
disown and project outwardly passive- 
receptive (homosexual) incorporative aims 
derived from fixation on anal-erogenous 
zones. Tausk (1933), Starcke (1920), and 
Van Ophuijsen (1920) attempted to dem- 
onstrate that the persecutor in the de- 
lusion is assigned attributes associated with 
anal sensations and the buttocks. Aronson 
(1952) and Meketon, Griffith, Taylor, and 
Wiedeman (1962) have tested the concept 
of paranoia as a defense against passive- 


feminine homosexual impulses by compar- 
ing the frequency of “homosexual signs” in 
the Rorschach responses of paranoid as con- 
trasted to nonparanoid schizophrenics. The 
results have largely supported the concept. 
Moore and Selzer (1963) have shown that 
in terms of clinical reports homosexual con- 
flicts are more prominent in the paranoid 
than the nonparanoid schizophrenic. Both 
clinical and experimental observations tend 
to concur with the psychoanalytic formula- 
tion that the paranoid system is a defense 
against homosexual fantasies linked with 
passive anal-receptive attitudes. 


Study 2C 


If the paranoid delusion is correlated 
with anxiety about fantasies with anal 
reference, it should follow from the front- 
back awareness work which has been de- 
scribed that the paranoid would have rel- 
atively high awareness of his back. That 
is, disturbance about anal issues would 
be accompanied by intensified back con- 
cern. Operationally, this means that the 
paranoid schizophrenic should have greater 
back awareness than the nonparanoid 
schizophrenic. 


Procedure. Back awareness was measured with 
a 19-item front-back subscale imbedded in a Body 
Focus Questionnaire containing 110 items. Subjects 
were seen individually. An observer rated on a 3- 
point scale their level of cooperation. 

Subjects. Forty-four male schizophrenics were 
evaluated (paranoid, 27; nonparanoid, 17). They 
had not received shock therapy up to at least 6 
months prior to the test session. The median ages 
in the paranoid and nonparanoid groups were, re- 
spectively, 33 (range 22-47) and 35 (range 18-43). 
This difference is not significant. In both groups 
the median years of education was 12. Ratings for 
cooperation were not significantly different for the 
two groups. Equal proportions (67%) of each group 
were receiving tranquilizing medication; and the 
dosage levels were not significantly different. 


Results. The mean BFQ Back score 
for the paranoids was 8.7 (o = 3.2) and 
for the nonparanoids 6.5 (c — 3.4). A t test 
indieated that, as predicted, the difference 
between the group was significant at the 
< .001 level (t = 4.0). 

Discussion. The paranoids proved, in 
agreement with the hypothesis, to be more 
focused upon their backs than the nonpara- 
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noids. This presumably means that the 
schizophrenics most invested in defending 
themselves against anxiety-provoking anal 
impulses are also the most back aware. In 
predicting the elevated back awareness of 
the paranoid schizophrenic, the meaningful- 
ness of the front-back body awareness di- 
mension is further extended. More impor- 
tantly, the findings support Freud's model 
concerning the relationship of anal sensa- 
tions and anxieties to paranoia. 

Front-back skin resistance ratio. The 
meaningfulness of the front-back body 
awareness dimension provided an op- 
portunity for further testing of a theory 
regarding the relation between body per- 
ception and physiological activation. Fisher 
and Cleveland (1958) proposed that the 
more salient one body sector is in the 
body scheme as compared to another the 
relatively greater will be the physiological 
activation of the former. Basie to this 
formulation is the idea that there is a 
mutually reinforcing interaction between 
degree of attention given to a body sector 
and its level of activation. It is assumed 
that certain needs or anxieties may cause 
an individual to focus his attention per- 
sistently upon a body area and that such 
attention may produce an increment in 
activation of the area in the same way that 
thinking of certain muscles may result 
in an increase in their action potential. Or 
in the way that thinking about putting 
food into one's stomach may produce 
changes in stomach activation. In turn, the 
Increased activation of an area results in 
it becoming a source of augmented sensory 
experience whieh draws further attention 
to it. Thus, a circular feedback system in- 
volving attention and activation level 
could be established. Such a system might 
be found to apply to various organs or 
types of tissue (e.g, skin, vasculature, 
heart). Support for this view has already 
come from previous studies in which rela- 
tive salience, as defined by the size attrib- 
uted to body parts in projective settings, 
proved to be correlated with their relative 
activation as represented by skin resistance 
levels. Relationships between relative at- 
tributed size and skin resistance have been 
demonstrated for the following body sec- 


tors: front versus back, right versus left, 
head versus trunk, upper half versus lower 
half (Fisher, 1958; Fisher, 1960a; Fisher, 
1961a; Fisher, 1961b). 


Study 2D 


It should follow from the above formula- 
tion that the greater an individual's aware- 
ness of the back as contrasted to the front 
of his body the relatively more activated 
should be the first in relation to the second. 
It was hypothesized that the higher the 
BFQ Back score the lower would be the 
resistance level of the back (i.e., more acti- 
vated) in relation to that of the front. 


Procedure. Back awareness was measured by 
means of nine BFQ items. Skin resistance measures 
were recorded with a Brush direct writing oscillo- 
graph. There was a constant current supply of 20 
microamperes and a DC amplifier for measuring 
the voltage across the subject. 'The record was cali- 
brated in ohms. Separate balanced systems were 
utilized for the front measure and the back meas- 
ure. Area of recording from the sites was equalized 
by means of pieces of tape with two holes, each 
1⁄4 inch in diameter, punched into them. The period 
of recordings was based on the time required for 
both sites to stabilize to a point of no change for 
a 15-second period. A minimum of 30 seconds was 
taken in any case. The median length of recording 
was 165 seconds. The front recording area was 
taken from the first flat surface on the neck just 
below the Adams apple. An homologous area on 
the back of the neck was selected as the back site. 
Recordings were taken from the neck because it is 
the only sector not covered by clothes in which 
homologous front and back sites can be selected 
with some accuracy. A final reactivity value was 
tabulated equal to the ratio of the front resistance 
level to the back resistance level (front resistance/ 
back resistance) at the point of stabilization? The 
larger the ratio the greater is the reactivity of the 
back site relative to that of the front. 

Subjects. The subjects were 52 male college stu- 
dents with a median age of 19. 


Results. The median BFQ Back score 
was 2. The median front-back skin re- 


2 The representativeness of the front-back reac- 
tivity ratio derived from the neck was evaluated 
in a special sample of 22 men. Front-back resist- 
ance readings were taken simultaneously from 
neck, upper chest region and lower chest region 
sites. The neck front-back resistance ratio proved 
to be correlated .43 (p < .05) with the upper chest 
front-back ratio and .42 (p < .05) with the lower 
chest front-back ratio. Thus, the front-back values 
from the neck are significantly related to those de- 
rived from other body sites. 
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TABLE 6 
CHI-SQUARE ANALYSIS OF RELATIONSHIP BETWEEN 
BFQ Back Scores AND Front-Back 
Sxin Resistance Ratios 


BFQ Back 
High* Low xe Significance 
High 19 7 
Skin resist- 
ance ratio 3.9** «.05 
Low 12 14 


* High = Above median. Low = At median or 
below. 


sistance reactivity ratio was 1.3. Because of 
the skewed character of the distributions, 
chi-square was used to examine the re- 
lationship between BFQ Back and the 
front-back skin resistance ratio. One can 
see in Table 6 that, as predicted, they are 
positively and significantly interrelated (x? 
= 8.9, p < .05). 

Discussion. Apparently, the greater the 
individual’s relative focus of attention upon 
the back the greater is the activation of the 
skin of his back in relation to that of the 
front. As earlier indicated, similar types of 
relationships have been observed between 
the skin resistance levels of body areas and 
the relative sizes attributed to them. The 
present findings are particularly significant 
because they are the first to show a link 
between a direct appraisal of how much 
attention an individual focuses on a body 
sector and the activation level of the skin 
in that sector. As data accumulate, it be- 
comes evident that the body image and 
distribution of excitation in the body are 
intimately interwoven. 

The question still remains as to the de- 
gree to which the correlation between BFQ 
Back and the front-back resistance ratio 
reflects the fact that those with higher 
activation of the back are receiving a 
greater amount of sensory stimulation 
from the back which is derivative of the 
physiological activation. However, if one 
considers that intensity of back awareness 
has turned out to be related in a meaning- 
ful way to the “anal character” typology 
it becomes difficult to interpret such aware- 
ness as a simple expression of level of back 


activation. That is, some proportion of the 
back awareness would seem to be related 
to attitudes one has learned to take toward 
one’s own anal regions and functions. How 
such attitudes may excite physiological 
activation of the back or in turn be inten- 
sified by physiological variables remains 
to be seen. 


Eyes 


The role of the eyes in obtaining infor- 
mation and also in the expressive function 
of the face has stimulated speculation about 
their psychological and symbolic signifi- 
cance. It has been variously suggested 
that they are unconsciously equated with 
oral "taking in" processes; hostile (“evil 
eye") intent; wishes to see forbidden sexual 
scenes; and even the genital organ itself 
(Fenichel, 1945). It is possible to conceptu- 
alize most of these speculations within the 
category of incorporative intent. Whether 
they suggest oral, sexual, hostile, or nonhos- 
tile aims they depict the eyes as accomplish- 
ing these aims by functioning as a channel 
for admission and “taking in.” It was there- 
fore conjectured that eye awareness would 
be linked with incorporative attitudes. But 
since the eyes are not truly incorporative 
in the way that a body opening like the 
mouth is, it was considered that the person 
who focuses on his eyes is probably one who 
is fearful of real incorporative wishes and 
therefore substitutes the sort of unreal 
ones exemplified by defining the eye as 
an oral channel. This view led to the hy- 
pothesis that degree of eye awareness is 
positively correlated with anxiety about 
incorporation and negatively so with in- 
dictators of free expression of incorpo- 
rative wishes. The model for this formu- 
lation is provided by the Freudian concept 
that the use of a substitute zone for an 
erogenous purpose is due to anxiety which 
prevents use of the corresponding real 
erogenous zone. Specific hypotheses which 
were derived are listed below: 

1. Degree of eye interest is inversely re- 
lated to the enjoyment of eating. The 
greater the emphasis upon the eyes (pre- 
sumably as a substitute incorporative 
channel) the less does the primary oral 
zone serve as a source of pleasure. 


y 
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2. Also, one would expect that the 
greater an individual’s focus upon his eyes 
the more anxiety he would evidence when 
responding to food related stimuli. 

3. Quite analogously, degree of eye 
focus should be positively correlated with 
amount of anxiety evoked by symbolic ref- 
erences to incorporation. 

4. Finally, it seemed logical to expect 
that eye interest would be negatively 
correlated with the degree to which one's 
parents were recalled as generous and 
giving. If eye interest depiets a sense of 
not being able to secure oral gratification, 
one might anticipate that such an attitude 
would reflect experiences with parents who 
appeared to be selfish and unwilling to give 
of their resources. 


Study 3 


Procedure. Eye awareness was measured with 
the Body Focus Questionnaire (BFQ). Included in 
the BFQ array of paired-comparisons were 11 items 
in which the eyes were compared to other facial 
areas (e.g. eyes versus ears, eyes versus mouth, 
eye versus chin). The BFQ Eye score could range 
from 0 through 11. It has been found that the test- 
retest coefficient for Eye scores, with a week inter- 
vening, is 54 in a group of 52 subjects (p < .001). 

The Byrne Food Attitude Seale (Byrne, Go- 
lightly, & Capaldi, 1963) was used to test the hy- 
pothesis that BFQ Eyes would be negatively cor- 
related with enjoyment of eating. This scale is 
composed of 221 items which inquire concerning 
liking for foods, pleasantness associated with past 
eating experiences, cooking skill of mother, and 
importance of food as a reward and comfort. Re- 
sponses to each item are registered by the subject 
in terms of True or False. Only 47 of the 221 items 
have shown scale coherence for males and these 
are the items which are scored. The higher the 
Score the more the subject is considered to have a 
Positive attitude toward eating. 

To investigate whether BFQ Eyes is positively 
related to anxiety when confronted with food 
stimuli, a selective memory procedure was used 
which has proven successful in other studies 
(Fisher, 1964a, 1964b). This procedure assumes that 
if a subject is asked to learn anxiety-arousing ma- 
terial, his recall for it will be relatively poorer than 
for equated material without anxiety connotations. 
Subjects (in small groups) were asked to view for 
1 minute a list of 20 words projected on a screen; 
and they were subsequently given 5 minutes to 
write down as many of the words as they could 
recall. The list consisted of 10 words referring to 
food and 10 without food implications which were 
of the same average length and randomly distrib- 
uted. The words in the list are enumerated below: 


Plan Beef 
Mint Road 
Hall Pair 
Bun Broth 
View Cream 
Raisin Fair 
Check Tea 
Honey Trace 
Book Plum 
Hash Stone 


A subject’s score equalled the number of food 
words minus the number of nonfood words re- 
called. 

To ascertain whether BFQ Eyes is positively 
correlated with the level of anxiety aroused by 
symbolic references to incorporation, the Blacky 
Pictures technique was again employed. Only the 
Oral Eroticism and Oral Sadism pictures, which re- 
spectively depict Blacky sucking mother’s breast 
and biting mother’s collar, were considered to be 
pertinent to the hypothesis. It was expected that 
BFQ Eyes would be positively correlated with the 
subject’s reluctance to tell a story about each of 
these pictures. 

The question whether BFQ Eyes would prove 
to be related to the subject’s recall of his parents’ 
generosity required that ratings of the parents be 
obtained. Each subject indicated on a 3-point scale 
how applicable to his mother were each of nine 
statements concerned with generosity (e.g., “Feels 
we should help those weaker than ourselves”; 
“Considers it important to help charitable causes”). 
The same responses were obtained with regard to 
father. Weights of 0, 1, and 2 were applied respec- 
tively to the response alternatives of “Not at all 
true,” “Slightly true,” and “Very true.” Total scores 
could range from 0 through 18. 

Sussects. The subjects consisted of 62 male col- 
lege students recruited by payment of a fee. Their 
median age was 20. 


Results. The mean BFQ Eyes score was 
6.4 (o = 2.5). The mean Byrne Food Atti- 
tude score was 34.5 (c = 5.5). Table 7 in- 
dicates that BFQ Eyes is, as predicted, sig- 
nificantly and negatively correlated with 
the Byrne index (r = —.29, p = .03). The 
greater the subject’s focus on his eyes the 
less he reports enjoyment of food-related ex- 
periences. 

The mean selective food memory score 
was +.2 (c = 2.3), indicating only a 
slight tendency for the group to remember 
more food than nonfood words. A corre- 
lation of —.29 (p < .05) was found be- 
tween BFQ Eyes and the memory score. 
It would appear, as hypothesized, that with 
increasing awareness of one’s eyes there is 
a parallel tendency to show selectively 
poor recall for references to food. 
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TABLE 7 


Pnopucr-MowENT CorrELATIONS or BFQ Eves 
WITH ÍINCORPORATIVE ATTITUDE VARIABLES 


BFQ Eyes versus r Stentieence 
Byrne food attitude —.29 03 
score (N = 61) 

Selective food memory —.29 «.05 
(N = 59)* 

Blacky oral eroticism AT n.s.> 
(N = 60) 

Blacky oral sadism 28 «.05 
(N — 60) 

Father generosity —.30 .03 
(N — 57) 

Mother generosity —.21 10 
(N = 60) 


a There are variations in N because some pro- 
tocols had to be discarded either as a consequence 
of misunderstandings of instructions or the in- 
appropriateness of certain questions (e.g., per- 
taining to a father or a mother who was long 
deceased). 

^ Does not even attain .20 level of significance. 


The results for the Blacky Pictures were 
not as clearcut. The mean Oral Eroticism 
rank was 3.0 (c = 7.2), and the mean 
Oral Sadism rank was 6.7 (c — 24). 
Table 7 indicates that while BFQ Eyes was 
correlated in the predicted direction with 
Oral Erotieism (r = .17), it was not 
significantly so. However, BFQ Eyes was 
significantly correlated in the expected 
direction with Oral Sadism (r = .28, p 
X .05). These findings modestly support 
the view that the more aware an individual 
is of his eyes the greater his anxiety when 
perceiving oral themes. 

The data involving the ratings of pa- 
rental generosity indicated a mean of 5.1 
(c = 2.0) for Father and a mean of 11.7 
(c = 8.4) for Mother. Table 7 reveals that 
BFQ Eyes has a significant correlation of 
—.30 (p = .03) with Father Generosity, 
but a less impressive correlation of —.21 
(p — .10) with Mother Generosity. 

Discussion. Once again a psychoanalytic 
framework has bridged the gap between 
findings which concern body sensations 
and those depicting personality patterns. 


Eye awareness has proven to be related to 
anxiety about eating and food as indicated 
by the results involving the Byrne Food 
Attitude Scale and the memory for food 
words. In a more modest way, the Blacky 
Pictures findings suggest that eye aware- 
ness may also be associated with anxiety 
about incorporation defined in a general 
symbolic sense. Evidence was found too 
that eye awareness is linked with a male 
subject’s recall of the generosity of his 
father, but not with his recall of mother's 
generosity. The predictions which were sup- 
ported evolved from the complex assump- 
tion that if an individual concentrates his 
attention upon a body area capable of 
serving as a substitute or symbolic opening 
he does so because he is fearful of experi- 
ences with some primary body opening. 
Presumably, the focus upon the symbolic 
opening is an indirect attempt to experi- 
ence what is forbidden elsewhere in the 
body. This assumption is a derivative of 
Breuer and Freud's generalized theory con- 
cerning the mechanisms underlying con- 
version phenomena. 


Bopy AWARENESS 


The way a person distributes his attention 
to his body may be conceptualized not only 
in terms of the amount he focuses upon vari- 
ous body regions, but also with regard to the 
relative amount he gives to his own body as 
compared to other objects in his environs. 
Previous studies indicate there is great indi- 
vidual variation in the attention devoted to 
one’s own body. Some are intensely con- 
cerned with their own body sensations; 
and at the opposite extreme are others who 
have little such awareness. Measurement 
of body awareness has proven to be feasible 
with a technique based on the frequency 
with which an individual refers to his own 
body when a sample is taken of what lies 
within his immediate awareness (Fisher, 
1964b). With this technique it has been 
possible to demonstrate that in males there 
is a positive relationship between body 
awareness and the prominence of the nutri- 
tive-digestive areas in the body scheme. 
The more aware a man is of his body the 
more he focuses attention upon his stomach, 
gut, mouth, and related accessory sectors. 


im 
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It has been shown that general body aware- 
ness is positively correlated with stomach 
awareness as defined by responses to paired 
body comparisons involving the stomach 

(e.g., stomach versus heart, stomach versus 
arms) in the BFQ. Further, body aware- 
ness has proven to be positively correlated 
with selective superior recall for words per- 
taining to oral-nutritive parts (e.g., mouth, 
stomach) as compared to words referring 
o nonnutritive areas (e.g., spine, skull) of 
he body (Fisher, 1964b). 

The preoccupation with nutritive-diges- 
ive body sectors accompanying high body 
awareness in men intimated that sensa- 
tions from the nutritive sectors must by 
heir own salience and the sensations they 
arouse in other body systems play a large 
role in drawing the individual’s attention 
to his body. If oral sensations contribute 
heavily to the male’s body awareness, it is 
logical to expect that at still another level 
body awareness is related to incorporative 
attitudes. The presence of persistent sensa- 
tions in oral-digestive regions could indi- 
cate anxiety about the incorporative func- 
tions of these regions. That is, it might be 
the person who has learned to be fearful 
about incorporation who becomes preoccu- 
pied with sensations from body sectors 
participating in incorporative activity and 
who, therefore, under the stimulus of tun- 
ing in on such sensations arrives at an un- 
usual awareness of his body, as against 
other perceptual objects. The fact that 
body awareness is not correlated with 
awareness of other localized body regions 
besides the oral-nutritive ones indicates 
that such sensations have special potency 
in drawing the male’s attention to his 
body, 


Study 4 


In view of the above findings, the fol- 
lowing was hypothesized: 

1. The greater an individual's aware- 
ness of his body the higher will be his 
underlying anxiety about incorporation 
and therefore the more limited his ability 
to enjoy the incorporative process exem- 
plified in eating. 

2. It was anticipated that general body 


awareness would be positively correlated 
with the amount of anxiety aroused by 
references to incorporative themes. 

3. Further, if an individual's degree of 
body awareness derives from anxiety about 
incorporation one might expect that it 
would be negatively related to how al- 
truistie he recalls his parents to have been. 
If there is anxiety about oral gratification, 
it eould be a function of experiences with 
parents who were apparently unwilling to 
give. 

Procedures. The prominence of the subject's 
body in his own perceptual field was measured in 
terms of what lay within his awareness at a given 
time. He was asked (in a group) to list on a sheet 
of paper "twenty things that you are aware of or 
conscious of right now." The 20 responses given 
were scored by summing the number of references 
he made to his own body. Such body references 
were defined so as to include explicit body designa- 
tions (e.g., “My head hurts"), temperature or kin- 
esthetic sensations, eating experiences (e.g. “I 
would like to eat a piece of pie") and descriptions 
of one's own clothing (e.g, “My shirt is blue"). 
Interseorer agreement for two judges for 59 proto- 
cols was 95%. The rationale for this measurement 
is that the greater an individual’s perceptual focus 
upon his body the more should his body (or ap- 
propriate equivalents) find representation in his 
reports regarding the content of his awareness. A 
subject’s score could range from 0 through 20. 

The Byrne Food Attitude Scale, described 
above, was once again used to determine the sub- 
ject’s enjoyment and interest in eating. Also, a 
measure of his preferences for a list of 103 foods 
(20 of which are part of the Byrne scale) was ob- 
tained. He was asked to indicate for each food 
item whether he liked or disliked it. His score was 
the total number of foods liked. It could range 
from 0 through 103. 

The Blacky Pictures ranking procedure was 
once again used to evaluate the subject’s anxiety 
when responding to stimuli referring to incorpora- 
tion. It was expected that the two Blacky pictures 
pertaining to orality (Oral Eroticism, Oral Sadism) 
would be given low preference by those with high 
body awareness. 

The amount of altruism attributed to each of 
the parents was appraised with the same series of 
nine items used to measure parental altruism in the 
study described above of the BFQ eye variable. 

Subjects. The subjects were 58 male college stu- 
dents (median age 20). 


Results. The mean Body Prominence 
score was 3.7 (c = 2.5). The mean Byrne 
Food Attitude Seale score was 34.8 (s = 
5.4). 

Table 8 indicates that the Body Promi- 
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TABLE 8 
Pnopvcr-MowENT CORRELATIONS OF Bopy 
AWARENESS WITH INDEXES RELATING 
TO INCORPORATION 


garding the relationship of Body Promi- 

nence to Oral Sadism was not borne out. 
The Father Altruism mean was 10.1 

(c = 3.3) and the Mother Altruism mean 


Body awareness versus r Sanies 
Byrne food attitude —.30 «.05 
scale (N — 58) 
Total number of foods —.25 .05 
liked (N = 58) 
Blacky oral eroticism .25 2.05 
(N = 57) 
Blacky oral sadism —.1 n.g.* 
(N = 56) 
Father altruism .03 n.s 
(N = 54) 
Mother altruism —.11 n.s. 
(N = 57) 


* Does not even attain .20 level. 


nence score was, as predicted, negatively 
and significantly correlated with the Byrne 
Food Attitude Scale (r = —.30, p < .05). 
Included in the Byrne scale is a list of 103 
foods, and the subject indicates which he 
likes and dislikes. The mean number of 
foods liked was 80.6 (c = 11.8). Body 
Prominence proved to be negatively cor- 
related with the number of foods liked 
(r = —.25, p = .05). These data indicate 
that the less pleasurable eating appears to 
an individual the greater his body aware- 
ness. 

For the Blacky Pictures scores the mean 
rank for Oral Eroticism was 7.2 (c = 3.0) 
and for Oral Sadism 6.8 (c = 2.5). Table 8 
indicates that Body Prominence was, as 
predicted, positively correlated with the 
degree to which the Oral Eroticism picture 
was put low in the rank sequence (r = 
.25). The p value for the correlation is just 
short of the .05 level. This relationship was 
examined further by means of a chi-square 
comparison in which the trichotomized 
(into as equal thirds as possible) Promi- 
nence scores were related to the dichoto- 
mized (at median) Blacky scores. The x? 
of 6.8 (df = 2) was significant at the 
<.05 level. However, the prediction re- 


11.8 (c = 3.5). Table 8 indicates that 
Body Prominence was, contrary to predic- 
tion, not significantly related to these 
variables. p 

Discussion. There is only modest con- 
gruence between the data and the hypothe- 
ses. The best results were obtained for the 
prediction that Body Prominence would 
be inverse to satisfaction derived from 
eating. The Byrne scale and the index of 
number of foods liked were both related to 
Body Prominence in the fashion antici- 
pated. With increased Body Prominence 
there is a corresponding negative attitude 
toward food intake which can be inter- 
preted as relating to anxiety about incor- 
poration. 

It is encouraging too that with increasing 
Body Prominence one finds augmented 
anxiety about the Blacky Oral Eroticism 
pieture as defined by unwillingness to re- 
late a story about it. But Body Promi- 
nence did not turn out to be related to Oral 
Sadism. This may reflect the weakness of 
the hypothesis. It is also possible that the 
oral anxiety linked with body awareness 
pertains specifieally to incorporation (as 
represented by the Oral Erotieism picture) 
and not to the sadistie, biting intent por- 
trayed in the Oral Sadism picture. 

The formulation relating Body Promi- 
nence, with its presumed concern about 
incorporation, to the subject’s recall of the 
degree of selfishness of each of his parents 
was not affirmed by the findings. One can- 
not trace the attitude toward incorpora- 
tion which is linked with body awareness 
to the simple matter of the parents' re- 
called generosity. 

The results show promising continuity. 
If one considers that body awareness is 
associated with focusing upon the oral re- 
gions of one’s body, inhibited eating be- 
havior, and also anxiety in the perception 
of a pictured oral incorporative theme, it 
is clear that there is some substance to the 
idea that the amount of attention a man 
directs to his body is related to how much 
anxiety he has about taking in and con- 
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suming. Why such a relationship should 
exist is puzzling. However, it has been 
noted (Fisher, 1964b) that while body 
awareness is encouraged in woman by the 
culture it is discouraged for men. Van 
Lennep (1957) reported that the male 
manifests decreasing interest in his own 
body as he matures beyond adolescence. 
For the female the opposite is true. Per- 
haps the association in the male between 
body awareness and incorporative anxiety 
represents the fact that it is the male with 
oral problems who, in terms of the litera- 
ture dealing with orality (Blum, 1949; 
Fenichel, 1945) would be expected to have 
difficulty in being an independent manly 
person, is also the one to be concerned in 
an unmanly way with his body sensations. 


Heart 


Unrealistie concern with one’s heart has 
been reported often as a neurotic symptom 
(Fenichel, 1945; Schneider, 1954). It has 
been conjectured that such concern reflects 
factors like repressed sexual excitement, un- 
expressed rage, and fear of death. In scan- 
ning the statements in the literature about 
what characterizes the person who focuses 
upon his heart, one finds their diversity 
difficult to integrate. Little agreement exists 
as to which affects or impulses might preoc- 
cupy the heart-conscious individual. How- 
ever, there are intriguing references to the 
idea that the heart, because of its special 
importance and its unique prominence as à 
source of body sensations and rhythms, may 
easily become involved with the individual's 
fantasies and conflicts. Perhaps it offers a 
convenient focus for feelings and anxieties 
about oneself. 

The heart variable seemed to be worth 
Study, but there was little material avail- 
able from which to derive hypotheses about 
its possible personality relationships. There- 
fore, the decision was made to undertake, 
first of all, some general explorations by 
means of the Body Focus Questionnaire 
(BFQ), which has already been described. 
It contains a subscale of 16 items sampling 
how aware the individual is of his heart. 
The Heart subscale has shown a test-retest 
reliability of .62 (N = 50) over a period of 
1 week. A number of widely scanning stud- 


ies have been pursued with BFQ Heart 
to ascertain its relationships with personal- 
ity measures (e.g, Edwards Preference 
Schedule) and social variables (e.g., social 
class, religion), but the results have been 
largely of a chance order. It would serve 
little purpose to describe them. One of the 
few promising trends that did emerge was 
the observation that an individual’s heart 
awareness is positively related to his rat- 
ings of religiosity of his parents and also 
himself. This finding seemed noteworthy in 
light of previous reports that persons with 
anxiety about their hearts are unusually 
conscientious (Ross, 1945). In fact, a study 
by Wittkower, Rodger, and Wilson (1941) 
portrays such persons as puritanical, with 
a strong sense of duty and morality. Thus, 
one could discern an initial basis for re- 
garding heart awareness as related to issues 
of morality, religiosity, and virtuous con- 
formity. The possibility presented itself 
that heart awareness would be positively 
correlated with an approach to life empha- 
sizing religious commitment, with its ac- 
companying concern about issues of right 
and wrong. 


Study 5 


Using this framework, the following 
hypotheses were ventured: 

1. The greater an individual's aware- 
ness of his heart the more religious should 
be his orientation. 3 

2. A derived assumption is that his de- 
gree of heart focus would be positively 
related to the amount of religiosity he 
ascribes to his parents. 

3. With inereasing heart awareness there 
should be enhanced concern and guilt about 
wrongdoing. 

Degree of heart focus should be posi- 
tively linked with anxiety about sexual 
expression, since sexual behavior is among 
the most stringently regulated by religious 
standards. 

Relatedly, it was anticipated that in- 
tensity of heart focus would be inverse to 
the amount of sexual expressiveness the 
individual recalls as typifying his parents. 

A more tangential prediction was also 
made about the relationship between heart 
awareness and openness to aesthetic ex- 
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periences. Among the few significant rela- 
tionships observed in earlier exploratory 
studies was a negative correlation between 
BFQ Heart and the Aesthetic subscale of 
the Allport-Vernon-Lindzey Study of Val- 
ues. This suggested that the more aware an 
individual is of his heart the less is his in- 
terest in, and sensitivity to, artistic and 
imaginative representations. The pertinence 
of this hypothesis to the tentatively formu- 
lated  religious-oriented picture of the 
heart-oriented person is pointed up by the 
fact that previous studies (Allen, 1955) 
reported a trend for religiosity and aes- 
thetic interest to be negatively correlated. 


Procedure. Heart awareness was measured with 
the 16-item Heart subscale of the BFQ. 

Several procedures were employed to evaluate 
religiosity. 

1. The subject estimated the average number of 
times per month he currently attended church. 

2. He rated his own level of religiosity on a 
5-point scale. 

3. His score on the Religious subscale of the 
Allport-Vernon-Lindzey Study of Values was de- 
termined. 

The religiosity ascribed by the subject to his 
parents was evaluated by obtaining his estimates 
of how often each, on the average, attends church 
per month, Also, he indicated on a 5-point scale 
his response to the following question: How im- 
portant a part did religion play in your family 
when you were growing up? 

Measurement of guilt and anxiety about wrong- 
doing was approached in two ways. 

1. One involved a selective memory task. It 
was anticipated that the higher an individual’s 
sense of guilt the more he would selectively for- 
get words he had learned which referred to guilt 
themes. The following list of words was exposed 
on a screen in a group setting. 


words minus the number of nonguilt words re- 
called. 

2. A second measure of guilt concern was de- 
rived from a previously described ranking proce- 
dure involving the Blacky Pictures. One of the 
Blacky Pictures, titled “Guilt Feelings,” shows the 
dog Blacky being reproved by a figure symbolic 
of his conscience. It was predicted that the higher 
the BFQ Heart score the lower would be the rank 
assigned to the Guilt Feelings theme. 

Four different procedures were utilized to exam- 
ine the subject’s orientation toward a sexual role 
and sexual expression. 

1. Amount of heterosexual activity was taken 
as one criterion of freedom to be sexually expres- 
sive. It was appraised by means of the same, 
earlier described, questionnaire which inquires 
concerning frequency of dating in high school 
and college. 

2. Selective memory for sexual words served as 
another index of anxiety about sexual issues. Sub- 
jects viewed the following lists of words for 1 
minute and were then given 5 minutes to recall 
them. 


Plan Caress* 
Touch* Train 
Debate Listen 
Run Perfume* 
Dance* Write 
Feel* Kiss* 
Build Twist* 
Flirt* Skate 
Dust Hug* 
Date* Color 


Round Honest* 
Fault* Sight 
Judge* Happy 
Across Bible* 
Book Forge* 
Steal* Worker 
Ready Law* 
Rule* Paint 
Bark Wrong* 
Verdict* Clerk 


[Guilt words are starred] 

The subject was told to study the list. After 
he had done so for 1 minute, he was given 5 
minutes to write on a sheet as many of the words 
as he could recall. Ten of the words in the list 
refer directly or indirectly to guilt linked ideas; 
and 10 are neutral. The mean length of the two 
sets of words is equivalent. A selective memory 
score was derived equal to the number of guilt 


[Sexual words are starred] 

Ten of the words have sexual connotations. 
The other 10 are neutral and of the same average 
length as the sexual ones. À memory score was 
computed equal to the number of sexual minus 
the number of nonsexual words recalled. It was 
considered that the greater the subject’s anxiety 
about sexual matters the more he would show 
selectively poor recall for the sexual words. 

3. A third way of sampling sexual anxiety in- 
volved the Blacky Pictures ranking procedure. 
The lower the ranks assigned by the subject to 
the three pictures with sex related themes (Love 
Object, Oedipal Intensity, and Castration Anxi- 
ety) the more elevated was his sexual anxiety 
taken to be. 

4. Still another approach to the matter of sexual 
orientation was attempted by means of the Vague 
Sex Pictures earlier described. These are the pic- 
tures which, by virtue of their vague definition 
of the sex of the human figures shown, are in- 
tended to arouse anxiety in those who have 
poorly defined concepts of their own sex roles. 
Degree of anxiety aroused by these pictures is 
evaluated in terms of the amount of negative 
affect they evoke, as evidenced in ratings of the 
attractiveness and intelligence of the figures. 

The subject's perception of how freely his par- 
ents expressed themselves with regard to sexual 
matters was tapped with an eight-item question- 
naire. It was first requested that he indicate on a 
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3-point scale the degree to which such items as the 
following applied to his father and then to his 
mother: 

1. Likes to be considered physically attractive 
by members of the opposite sex. 

2, Tells jokes with sexual references. 

3. Provides advice and counsel on sexual mat- 
ters. 

Weights of 0, 1, and 2 were applied respectively 
to the response alternatives of *Not at all true," 
"Slightly true," and “Very true.” Total scores 
could range from 0 through 16. 

Motivation for seeking out and opening oneself 
io aesthetic experiences was measured with the 
Aesthetie score of the Allport-Vernon-Lindzey 
Study of Values. This is the index which has 
already been mentioned as having been related to 
BFQ Heart in an earlier exploratory study. 

Subjects. Sixty-one male students participated 
as subjects (median age 21). 


Results. 'The mean BFQ Heart score 
was 5.3 (c = 4.3). Table 9 indicates some 
support for the hypothesis that BFQ Heart 
is positively correlated with degree of 
religiosity. One finds BFQ Heart posi- 
tively related at a borderline level with 
estimate of frequency of church attendance 
and also self-rating of religiosity. When 
church attendance frequency and self- 
rating of religiosity were simply summed 
for each subject, this combined index was 
significantly related to BFQ Heart (x? = 
6.9 [df = 1], p « .01). A relationship 
of .40 (p < .005) was found between BFQ 
Heart and the Study of Values Religious 
score, 


Table 9 demonstrates too that there are 
trends for the subject’s BFQ Heart score to 
be positively linked with the level of 
religiosity he attributes to his family. It is 
positively correlated (r = 38, p < .01) 
with estimates of frequency of mother’s 
church attendance and ratings of impor- 
tance of religion in the family (r = .24, 
p < .10). While it is positively correlated 
with estimates of the frequency of father’s 
church attendance, the coefficient is not 
significant. 

The idea that BFQ Heart is tied in with 
a sense of guilt about wrongdoing was 
slightly reinforced by its borderline nega- 
tive correlation (r = —.22, p < .10) with 
selective memory for words referring to 
guilt themes. This relationship was ex- 
amined further by means of chi-square. 
The trichotomized (as equal thirds as 
possible) Heart scores were compared with 
the dichotomized (at median) memory 
score. A x? of 9.1 (df = 2) was found which 
is significant at the <.02 level. However, 
skepticism is encouraged by the fact that 
BFQ Heart was not significantly related 
to the rank assigned to the Blacky Guilt 
Feeling picture. 

The findings for the sexual behavior 
variables were equivocal. Table 11 shows 
that BFQ Heart has only a chance rela- 
tionship to average number of dates per 
week in high school and college and also 
to the index of serious dating. 


TABLE 9 
Propuct-Moment Correnations or BFQ Heart WITH RELIGIOUS VARIABLES 
BFQ Heart versus r Significance level 

Estimate of frequency of church attendance per month 23 <.10 
(N = 61) 

Self-rating of religiosity .24 «.10 
W = 61) 

Study of Values religious score .40 <.01 
(N = 58) 

Estimate of frequency of father’s church attendance per 15 n.s. 

month (N = 56) 

Estimate of frequency of mother’s church attendance per .38 «.01 
month (N = 60) 

24 «.10 


Estimate of importance of religion in family 


TABLE 10 


Pnopvcr-MowENT CORRELATIONS or BFQ HEART 
WITH GUILT INDEXES 


BFQ Heart versus r pan 
Guilt memory — .22 <.10 
(N = 60) 
Blacky guilt feelings .12 n.s. 
(N = 60) 


A borderline negative correlation in the 
predicted direction was observed between 
BFQ Heart and selective recall for sexual 
words (r = —.23, p < .10). Similarly, BFQ 
Heart related in the predicted direction at 
a borderline level to Blacky Castration 
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Anxiety (r = .24, p < .10). This relation- 
ship was appraised also by means of chi- 
square involving the comparison of dichot- 
omized (at median) Heart scores with 
trichotomized (equal thirds as possible) 
Blacky scores. The x? was 9.7 (df = 2) 
which is significant at the <.01 level. 
BFQ Heart had a chance relationship to 
Blacky Oedipal Intensity and one with 
Blacky Love Object that was significant in 
a direction opposite to that predicted. 

BFQ Heart was correlated .28 (p < .05) 
with the degree to which the Vague Sex 
Pictures were perceived as ugly. The cor- 
relation of BFQ Heart with how un- 
friendly the Vague Sex Picture figures were 
judged to be was not significant. 


TABLE 11 
Pnopvcr-MowENT CORRELATIONS or BFQ HEART WITH SEXUAL INDEXES 
r Significance level 
Heterosexual behavior 
Average number of dates per month in high school .08 ns. 
(N = 61) 
Average number of dates per month in college —.02 n.s 
(N = 58) 
Index of serious dating —.03 n.s 
(N = 60) 
Sexual memory —.23 <.10 
(N = 61) 
Blacky Pictures 
Blacky castration anxiety .24 «.10 
(N = 60) 
Blacky oedipal intensity 18 n.s. 
(N = 60) 
Blacky love object —.37 <.01 
W = 60) 
Parental sexual behavior 
Recall of father’s sexual expressiveness —.04 n.s 
(N = 57) 
Recall of mother’s sexual expressiveness —.42 <.001 
(N = 59) 
Vague sex pictures 
Ugly ratings +28 <.05 
(N = 56) 
Unfriendly ratings —.11 n.s 
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As anticipated, BFQ Heart had a sub- 
stantial negative correlation (r = —.42, 
p < .001) with recall of how sexually ex- 
pressive mother was but a chance correla- 
tion with the same index as it applies to 
father. 

A final result to be mentioned is the fact 
that there was an encouraging correlation 
of —.31 (p < .03) in the predicted direc- 
tion between BFQ Heart and the Aesthetic 
scale of the Study of Values. 

Discussion. The results embracing re- 
ligiosity of self and recalled religiosity of 
one’s parents concur with the hypothesis 
that the more aware an individual is of his 
heart the greater his current and past com- 
mitment to religious values. Although some 
of the correlations between BFQ Heart 
and religious parameters (e.g., frequency 
of church attendance and self-rating of 
religiosity) are not very substantial, they 
still carry weight because they represent 
cross-validation of the same relationships 
which were observed in an earlier study. 

The hypothesis concerning the associa- 
tion of guilt about wrongdoing with heart 
awareness was supported by the findings 
involving selective memory for guilt words 
but not so by the Blacky Guilt Feelings 
data, The results for the selective memory 
variable do seem to be worth further study. 
They not only attained statistical signifi- 
cance, but have an attractive pertinence to 
the concept of heart awareness as being a 
function of a religious, moralistic orien- 
tation. 

There is no sign of a relationship be- 
tween BFQ Heart and reported frequency 
of heterosexual behavior. Heart awareness 
showed a borderline inverse relationship to 
the ability to recall sexual words that had 
been learned. If the repression of the sexual 
words is ascribed to their arousal of anxi- 
ety, one can consider the possibility that at 
least at the level of thought and verbal 
concept increased heart awareness might 
be accompanied by increased anxiety about 
Sexual themes. The possibility that heart 
awareness is linked with sexual anxiety 
was also reinforced by the significant posi- 
tive correlation found between BFQ Heart 
and ratings of ugliness of the Vague Sex 
Pictures. But a skeptical attitude is en- 


couraged by the fact that the correlations 
of BFQ Heart and the Blacky Pictures 
portraying sexual themes were largely not 
as predicted. To further complicate mat- 
ters, one notes that BFQ Heart had, as pre- 
dicted, a substantial negative correlation 
with recall of mother’s sexual expressive- 
ness, although not with father’s. The results 
for the sexual variables are complex and 
inconclusive. A few leads are promising 
which indicate some relationship between 
heart awareness and sexual anxiety, The 
results do not suggest that heart awareness 
is related to sexual behavior in any gen- 
eralized sense. 

Cross-validation was obtained of the 
original finding that heart awareness is in- 
verse to interest in aesthetic experience. If 
one conceptualizes aesthetic interest as 
indicating openness to novel representations 
and fantasy productions, it would follow 
that the heart-focused individual tends to 
seal himself off from such stimuli. This 
finding can be used as a keynote to inte- 
grate much of the data. Focusing upon 
one’s heart can be regarded as part of a 
way of life which revolves about a closed- 
off world defined by religious precept and 
perhaps also guilt. It may be an important 
part of this way of life to feel guilt and 
anxiety about certain forms of fantasy, 
particularly those expressing sexual wishes 
or the urge to do what is wrong. 

It becomes an exciting matter to deter- 
mine why an individual’s intensity of at- 
tention to his heart should be linked with 
such an orientation. One could pursue 
Fenichel’s suggestion that the heart be- 
cause of its rhythmic pulsation and its 
growing larger-growing smaller qualities 
is easily associated with sexual experience. 
As such, devotion of attention to one's 
heart could represent an anxious concern 
with an organ symbolizing illicit excite- 
ment incompatible with a religious orien- 
tation. Of course, an argument against 
this formulation would be the fact that 
BFQ Heart had rather inconclusive rela- 
tionships with the sexual variables which 
were studied. 

Another speculation could go to the 
opposite extreme and suggest that the 
heart is one of the morally “safest” body or- 
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gans to which one can direct one's atten- 
tion. There are no taboos about referring or 
attending to one's heart. This contrasts 
with the fact that overtones of sex, dirt, 
and other embarrassing topics apply to 
many other major body sectors (e.g., gut, 
genitals). Perhaps the individual raised in 
a moralistic atmosphere which contains 
taboos about looking at, or touching, 
“bad” body regions would find his heart 
one of the few safe allowable body experi- 
ences. In focusing upon his heart he could 
experience awareness of his body, but 
without showing interest in the “bad” 
side of himself. 


PERCEPTUAL SELECTIVITY IN TERMS OF 
THE AMES THERENESS-THATNESS 
APPARATUS 


Study 6 


Having completed the above studies, it 
was decided to apply the results to making 
pinpointed predictions about the relation- 
ship between BFQ variables and perceptual 
selectivity. The question was whether 
body-attention parameters could be used 
to anticipate how subjects would react to 
pictures with varying themes presented in 
the Ames Thereness and Thatness Table 
(T-T) (Kilpatrick, 1952). The Ames T-T 
can be used to create an ambiguous per- 
ceptual situation in which the value or 
emotional significance of a stimulus (e.g., 
picture) can be determined in terms of size 
or distance characteristics ascribed to it 
(Hastings, 1952; Hastorf, 1950; Kilpatrick, 
1952). It was anticipated that if an indi- 
vidual were asked to make a judgment in 
the T-T setting about a stimulus touching 
on the conflict associated with one of his 
body attention patterns he would display 
sensitivity to that stimulus. For example, 
if he had a high BFQ Right score he would 
be expected to register an exaggerated re- 
sponse to a heterosexual theme. Or if he had 
a high Body Prominence score he might 
demonstrate an accentuated reaction to an 
oral stimulus. The following were the pre- 
dictions made: 

1. BFQ Right will be positively corre- 
lated with accentuated response to pictures 


with heterosexual content (viz., a nude 
female). 

2. BFQ Back should be positively cor- 
related with exaggerated reaction to a 
picture with homosexual connotations (e. 
g., rear view of a nude male). An explana- 
tion is in order concerning the deriva- 
tion of this hypothesis. The data which 
have been collected relating to the Back 
dimension indicate that back awareness is 
positively correlated with the occurrence 
of “anal character” traits. Such traits are, 
within the psychoanalytic model, basic to 
an orientation typified by ambivalent at- 
titudes toward men such as are associated 
with homosexual conflicts. It was with 
this concept in mind that it was earlier 
predicted and verified that paranoid schizo- 
phrenics, whose principal conflicts are pre- 
sumably homosexual in nature, would be 
typified by high back awareness. 

3. It may be anticipated that BFQ 
Eyes will be positively related to indicators 
of augmented response to an oral theme 
(viz., picture of ice cream). 

4. The prediction about BFQ Eyes 
would be expected to apply analogously to 
the Body Prominence dimension. 

5. The prediction chosen for BFQ Heart 
was to the effect that it would be positively 
correlated with accentuated response to 
the theme of flagrant sexuality (viz., nude 
female). This hypothesis derives, of course, 
from the idea that an open display of sex- 
uality is particularly disapproved in re- 
ligious and puritanical systems. 


Procedure. Measures relating to Back, Right, 
Eyes, and Heart awareness were obtained with the 
110-item BFQ form. Body Prominence was ap- 
praised with the same procedure as already de- 
scribed above. 

The Thereness-Thatness technique was em- 
ployed for measuring perceptual response. It 
consists of two viewing tunnels which are side by 
side. The tunnel on the right, which is com- 
pletely dark, contains no cues for distance and 
therefore none for size. The stimulus to which 
the subject responds is projected on a screen set 
up in this tunnel at a distance of 2 meters from 
him, and it is viewed monocularly. In the left 
tunnel, viewed binocularly, there are five lucite 
rods (each lighted by a 15-watt incandescent 
lamp) at 65-centimeter intervals. A Clason pro- 
jector, on the right side of the apparatus and 
shielded from the subject's view, was used to 
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project the image of a pieture on the screen in the 
tunnel on the right. This projector can alter the 
size of the projected image over a wide range 
without significantly changing its clarity or bright- 
ness. As the image size is increased the picture 
seems to move toward the subject, and as it is 
decreased it appears to move away. It is therefore 
possible to present the subject with a judgmental 
task which seems to involve the spatial placement 
of a pieture but which actually revolves about 
altering its size on the screen. The experimental 
task was one in which the subject was asked to 
view (with his head in a headrest) a projected 
picture in the right-side tunnel and told that he 
could, by means of a knob, move it forward or 
backwards on a track in order to line it up with 
rods in the left-side tunnel. The instructions were 
as follows: 

You will be looking at various pictures of 

objeets which you will see in front of you. On 

your left you will see some lighted rods. Your 
job will be to turn the knob with your right 
hand and make the object line up with the rod 

I name. I want you to move the picture back 

and forth until it is even with the rod I name. 
The size setting made with the knob could be 
read from a pointer attached to the lens holder 
that moved as the subject turned the knob, A 
scale from 1 to 13 was used, with larger values 
indicating a larger image and by implication closer 
optieal placement. The voltage on the bulb in the 
Clason projector was kept at a maximum reading 
of 120 volts by means of an auto transformer, 
thus controlling its 4,250 lumen output. 

Six pietures were presented (front view of 
clothed male, front view of female nude, rhombus- 
shaped geometric figure, ice cream parfait, front 
view of clothed female, and rear view of male 
nude). They were all line drawings of the same 
height and width; and presented in the sequence 
just enumerated. Judgments of the pictures were 
obtained under six different conditions. Each 
picture was first presented at the apparent furthest 
position from the subject, and he was asked to 
line it up with the rod second closest to him. A 
second series of trials involved telling the subject 
to move the picture from the closest possible 
position to the position of the fourth rod. Thirdly, 
the picture was to be shifted from midway (half- 
way point on size scale) to the fifth rod. Fourth 
in sequence was the task of moving the picture 
from the apparent closest position to the fifth 
rod. Next, the picture was to be moved from mid- 
way to the second rod position. Finally, the sub- 
ject manipulated the picture from the farthest 
position to the apparent position of the third rod. 

Prior to the experiment the subjects were tested 
for visual acuity and astigmatism, respectively, 
by means of a Snellen chart and an astigma sun- 
burst chart. Only those with 20-20 vision and no 
astigmatic defects went on to participate. Five 
minutes of dark adaptation were allowed before 
the T-T task. Following the T-T trials the sub- 
ject was asked to recall the pictures he had seen. 


He then undertook in sequence the Body Promi- 
nence and BFQ tasks. At the end of the session 
he was again asked to recall the T-T pictures. 

The mean size setting of each picture for the 
six trials was computed. Also, the mean rank 
(Rank 1 = largest or “closest” setting) of each 
picture in relation to the other five pictures in the 
series was determined. The number of pictures 
the subject forgot or described with error in 
the two recall tasks was tabulated. Since six er- 
rors were possible for each recall, scores could 
range from 0-12. The purpose of this index was 
to ascertain the degree to which the subject 
seemed to be dealing repressively with the themes 
in the T-T pictures. 

The analysis of the data was complicated by 
issues of “defensive style” which have already 
been described in previous T-T studies. Shellow 
(1956) found that subjects manifesting anxious 
involvement in the T-T task made pictures rela- 
tively large (i.e. apparently closer to themselves). 
Those without such involvement made pictures 
relatively small (ie. apparently farther away). 
Analogous results were obtained by Hastings 
(1952) who noted that insecurity was positively 
correlated with setting pictures relatively “close.” 
Also, Kaufer (in Ittelson & Kutash, 1961) re- 
ported that persons characterized by an anxious 
“moving away” from people put emotional pic- 
tures “closer” to themselves. This contrasted with 
subjects typified by a “moving toward” others 
orientation who put such pictures “farther away.” 
In terms of previous experiments by Ittelson (in 
Kilpatrick, 1952) and Hastorf (1950) it is known 
that a picture presented in the T-T apparatus 
which is more vivid than another requires a 
smaller or “farther away” setting in order to be 
lined up with a spatial reference point. It is there- 
fore likely that the anxiously involved subjects 
who put specific T-T pictures relatively "close" 
to themselves do so because they defensively 
minimize their intensity. An evasive orientation 
under the T-T viewing conditions results in the 
pietures being set larger or “closer” because they 
appear subjectively less intense, The pictures 
require extra “magnification” to match the stand- 
ard of how large one would expect them to be at 
a given distance. 

The two memory tasks included in the present 
T-T procedure provided a means for determining 
whether the subject took a repressing attitude 
toward the T-T pictures. They made it possible 
to evaluate whether his anxiety was sufficiently 
intense to intrude repressively upon his cognitive 
functioning. The analysis of the T-T data was 
based on a separation of subjects into those mani- 
festing repression in their recall and those dealing 
nonrepressively with the pictures, This approach 
was encouraged by exploratory observations indi- 
cating that the relationships between BFQ scores 
and T-T settings were frequently reversed in the 
two groups. Thus, as anticipated, specific BFQ 
scores in the repression group tended to be posi- 
tively correlated with setting given pictures larger 
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{apparently closer); while in the nonrepression 
group the relations between such BFQ scores and 
T-T settings were in the opposite direction. 

Subjects. The subjects consisted of 54 male 
college students. Their median age was 20. 


Results. A division was made between 
subjects who evidenced no errors in their 
recall of the T-T pictures and those with 
two or more errors. Four subjects who 
made only one error were not included in 
the analysis in order to have a clear cutting 
point between the error and no error 
groups. This categorization derived from 
the formulation that a repressing (for- 
getting) response to the pictures should 
be expressed in a different mode of per- 
ceptual defense than a nonrepressing re- 
sponse. Twenty-five of the subjects proved 
to be Repressors and 29 Nonrepressors. 

The results pertaining to each BFQ 
category will be considered in turn. 

1. Right-Left. The mean BFQ Right score 
in the Repression group was 8.7 (c = 3.4); 
and in the Nonrepression group it was 8.1 
(c = 3.2). The T-T index to which the 
Right scores were related involved the dif- 


TABLE 12 
Cur-SQUARE ANALYSIS OF SIGNIFICANT OR NEAR 
SIGNIFICANT RELATIONSHIPS OF BODY ATTEN- 
TION VARIABLES TO THERENESS-THATNESS 
INDEXES IN ANXIOUS GROUP 


Variable Thereness-Thatness 


Female nude rank-Male nude rank 
HON EE x 


BFQ Right BS 2a 106577 6.8** 
L 8 AA (df 2:2) 
Female nude rank-Male nude rank 
HCM Lb 
BFQ Back SS 3 2 T.4** 
L 1 6.026 «^ (df = 2) 
Sum of nude ranks-Parfait rank 
H L 
BFQ Eyes H 7 3 
M 6 3 5.3* 
Lb 1 5 (df = 2) 
Female nude rank-Female clothed rank 
H Lb 
BFQ Heart Uy UG: 9 4,9** 
L 9 4 (df = 1) 


a Split at median. 

b Split into as equal thirds as possible. 
*p« .10. 

** p < 05. 


TABLE 13 


CHI-SQUARE ANALYSIS OF SIGNIFICANT OR NEAR 
SIGNIFICANT RELATIONSHIPS OF BODY ATTEN- 
TION VARIABLES TO THERENESS- 
THATNESS INDEXES IN NON- 

ANXIOUS GROUP 


Sum of nude ranks-Parfait rank 
L^ xt 


Body promi- H 4 4 
nence M 3 9 5.8* 
L^ 7 2 (df = 2) 
H Le 
BFQ Heart H 6 3 6.0** 
M 3 4 (df — 2) 
Le 2 11 


a Split at median. 

b Split into as equal thirds as possible. 
* p < 410. 

** p= 05. 


ference in rank between the female nude 
setting and the male nude setting. This in- 
dex was chosen because it evaluates the 
degree to which the subject responds se- 
lectively to a heterosexual, as compared to 
a nonheterosexual, nudity theme. The cod- 
ing of the rank difference scores was such 
that the more negative they were the 
larger (closer) was the female as compared 
to the male setting. In the Repression 
group the mean T-T difference score was 
—.l (c = 3.0); and in the Nonrepression 
group it was —.8 (c — 2.8). 

A significant chi-square (x? = 6.8, df = 
2, p < .05) was found in the Repression 
group between BFQ Right and the T-T 
female nude rank-male nude rank differ- 
ence. That is, the higher the subject's 
Right score the greater was his tendency to 
make the female nude picture larger 
(closer) than the male nude picture. The 
chi-square between BFQ Right and the 
female nude rank-male nude rank differ- 
ence was not significant for the Nonrepres- 
sion subjects. 

2. Front-Back. Mean BFQ Back scores 
were respectively 7.2 (o = 3.5) and 7.8 
(c = 4.5) in the Repression and Nonre- 
pression groups. 

'The same female nude rank minus male 
nude rank T-T index was used as just de- 
seribed above. In the present instance it 
was intended to tap differential response 
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to the male (homosexual) as compared to 
the female theme. The means and standard 
deviations were the same as those cited for 
the right-left data. 

In the Repression sample a significant 
chi-square was found between BFQ Back 
and the tendency to make the male nude 
larger (closer) than the female nude 
(2 = 7.4, df = 2, p = .025). The equiva- 
lent relationship in the  Nonrepression 
sample was not significant. 

3. Eyes. The mean BFQ Eye score was 
7.2 (c = 2.9) for the Repression subjects 
and 7.1 (o = 2.8) for those in the Nonre- 
pression category. The T-T index em- 
ployed was the difference between the 
average of the ranks of the settings for the 
two nude figures minus the rank of the ice 
cream parfait setting. It was intended in 
this way to determine if the response to the 
oral stimulus was different from the re- 
sponse to the other two most vivid or ego- 
involving picture stimuli in the series. The 
mean difference score was —.4 (ec = 2.6) 
for the Repressors and —1.2 (c = 2.6) for 
the Nonrepressors. 

Table 12 indicates that there was in the 
Repression sample a borderline relation- 
ship at the <.10 level between BFQ Eyes 
and the difference between the average of 
the nude ranks and the parfait rank (2 = 
5.3, df = 2, p < .10). The higher the sub- 
ject’s Eye score the greater the tendency 
to set the parfait picture relatively larger 
(closer) than the nude pictures. 

The equivalent relationship in the Non- 
repression group was of a completely 
chance order. 

4. Body Prominence. Mean Prominence 
scores in the Repression and Nonrepression 
categories were respectively 2.5 (c = 2.0) 
and 28 (c = 2.0). 

The same index for evaluating the re- 
sponse to the ice cream parfait was used 
as described above in the analysis of the 
BFQ Eye data. Means and standard devi- 
ations of the distributions were also the 
same. A chance relationship was observed 
in the Repression group between Promi- 
nence and the relative setting of the par- 
fait picture. But the results in the Nonre- 
pression sample (Table 13) indicated that 
the chi-square depicting the relation be- 


tween Prominence and the parfait index 
was 5.8 (in the predicted direction), which 
is just short of the 6.0 needed for signifi- 
cance at the .05 level. The greater the 
subject's body awareness the smaller (fur- 
ther away) did he set the parfait as com- 
pared to the nude pictures. It is important 
to keep in mind that the prediction of the 
direetion of relationship between the body 
image and T-T variables was such as to 
expect the trend in the Nonrepression 
group to be opposite to that in the Re- 
pression group. 

5. Heart. Mean BFQ Heart scores were 
5.9 (c = 41) for the Repressors and 3.2 
(c — 3.4) for the Nonrepressors. 

The T-T index chosen to tap the sub- 
ject’s reaction to a theme of openly. dis- 
played sexuality was the difference between 
the rank of his setting of the nude female 
pieture and the rank of his setting of the 
clothed female picture. His response to a 
minimally sexualized clothed female figure 
could be compared with that to a nude 
maximally sexualized female figure. In 
the Repression category the mean T-T in- 
dex was —.4 (c = 3.1) and for the Non- 
repressors it was +.1 (c = 2.7). The chi- 
square describing the relation of BFQ 
Heart and the difference between nude 
female and clothed female ranks was sig- 
nificant (y? = 49, df = 1, p < .05). 
The higher the subject’s Heart score the 
more likely he was to set the nude female 
larger (closer) than the clothed female. In 
the Nonrepression group the equivalent 
relationship was also significant (x? = 
6.0, df = 2, p = .05) ; and, as anticipated, 
jt was in the opposite direction. Contrast- 
ing with the trend for the Repressors, one 
finds here that the greater the Heart aware- 
ness the smaller (further away) the nude 
is set as compared to the clothed female. 

Discussion. Four of 10 predictions re- 
garding the relationships of the body at- 
tention and T-T picture setting variables 
were significantly supported, and 2 were 
supported at a borderline level. Within 
the Repression sample favorable results 
were found for four of the five hypotheses; 
while only two of five were favorable for 
the Nonrepressors. Apparently, the reac- 
tion aroused by the T-T task in the Re- 
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pressors results in a kind of involvement 
permitting body-image related attitudes to 
have an impact on their judgments of the 
pictures. Such involvement is much less 
evident for the Nonrepressors. The fact 
that there are differences in results revolv- 
ing about the Repression-Nonrepression 
distinetion confirms previous reports that 
the subject’s type of involvement in the 
T-T task plays a role in the perceptual 
defense strategy he displays. A number of 
the findings, though significant, were really 
of a borderline character. However, what 
is impressive is the fact that the body at- 
tention variables predicted the direction of 
response trends for so many aspects of a 
small number of pictures. The results are 
encouraging as exploratory indications 
that the way in which an individual dis- 
tributes his attention to his body is re- 
flected in selective Thereness-Thatness 
perception. 

The findings for BFQ Heart were the 
most consistent, with significant trends 
present for both the Repressors and Non- 
repressors. In the former group the greater 
the individual’s heart awareness the larger 
(closer) was his T-T setting of the nude 
female as compared to the clothed female, 
and in the latter group the obverse rela- 
tionship appeared. This exaggerated re- 
sponse to the image of the nude female 
supports previously cited data which sug- 
gested that heart awareness is related to 
issues of puritanieal morality with ac- 
companying concern about sexual propri- 
ety. 

Moderate support was obtained for the 
hypotheses related to BFQ Right and 
BFQ Back. In both instances there were 
significant results in the predicted direc- 
tion in the Repression sample, although not 
so in the Nonrepression sample. The higher 
a Repressor’s BFQ Right score the larger 
(closer) was his setting of the female nude 
in relation to the male nude. However, the 
higher his BFQ Back score the larger was 
his setting of the male nude relative to the 
female nude, Awareness of the right side of 
the body goes along with a defensive reac- 
tion to a heterosexual stimulus, and aware- 
ness of the back is accompanied by 
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defensiveness in dealing with the *homosex- 
ual” image. 

The results pertaining to Prominence 
and BFQ Eyes tended in the predicted di- 
rection, but not significantly so. The Prom- 
inence data were within a hairsbreadth of 
supporting the hypothesis in the Nonre- 
pression category but not in the Repression 
category. The fact that the poorest results 
occurred for two variables which were both 
appraised with T-T response to the ice 
cream parfait (oral theme) raises the ques- 
tion whether this picture was inadequate in 
its arousal properties. The T-T pictures 
were achromatic drawings, and there are 
hints that it may be difficult to depict food 
vividly without the use of color. Analogous 
difficulty would not seem to apply to the 
same extent in representing heterosexual or 
homosexual themes. This matter will be ex- 
plored further by making use of colored 
food pictures in future work. 


Discussion OF OVERALL FINDINGS 


Despite gaps in the data, there is a train 
of consistency and overlapping support for 
hypotheses which indicates that the manner 
in which an individual distributes atten- 
tion to his body is related to his conflicts 
and defenses. Relationships have been 
demonstrated that diversely involve total 
body awareness, broad spatial dimensions 
of the body (viz., right-left, front-back), 
and specific organs (viz., eyes, heart). The 
allocation of body attention seems to be a 
"earefully monitored process in which dis- 
tinctions are made between sectors in 
terms of their meanings and valences. In- 
deed, the specifieity of the confliets associ- 
ated with focusing upon certain body 
areas highlights the feasibility of using a 
person’s body perceptions to obtain infor- 
mation about his personality. What are the 
origins of the relationships that appar- 
ently exist between allocation of body 
awareness and personality variables? What 
significance do these relationships have 
for the behavior of the individual? Cur- 
rently one must approach such questions 
speculatively. 

It is possible to conceive of multiple 
ways in which psychological attitudes and 
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conflicts might become linked with amount 
of attention devoted to a body area. 

1. The link might derive from the fact 
that the same parental attitudes which 
shape one's personality early in life are also 
expressed in the treatment and restrictions 
applied to areas of one's body in terms of 
cleaning, touching, watching, "covering 
up," and so forth. For example, parental 
attitudes which declare the badness of sex 
might encourage traits like shyness or 
passivity in a man and also produce con- 
ditions which by preventing him from 
freely seeing and touching his genitals 
lead to vague awareness of, or decreased 
attention toward, them. This sort of 
“historical” explanation is, of course, par- 
ticularly favored in the psychoanalytic 
literature. 

2. Another possibility is that one’s body 
is uniquely close to the ego or self and 
therefore likely (as is true of many ego- 
significant targets) to become in whole or 
part a “screen” upon which one projects 
attitudes about self and the world. This 
would be exemplified by the individual 
who feels inferior and therefore unrealisti- 
cally perceives his body or some part of it 
as small. 

3. A third alternative would liken aware- 
ness of a body area to the experience that 
goes along with tensing a part of one’s 
body in preparation for an act. An angry 
person preparing to hit someone might per- 
ceive ynusual tension in his biceps as he 
gets ready to swing. In this way, a persist- 
ing wish to obtain certain goals might be 
accompanied by a chronic increased sub- 
jective awareness of the “tensed” or 

alerted” body areas used in attaining 
such goals. A relationship of this type be- 
tween body awareness and psychological 
parameters can, of course, be conceptual- 
ized in terms of the immediate situation 
without appeal to early developmental in- 
fluences. 

4. Obversely, awareness of a body area 
might represent not preparation for an act 
but rather watchful inhibition to assure 
oneself that an act will not be committed. 
The person afraid of acting out an angry 
wish might monitor the body areas in- 


volved in expressing aggression to make 
sure they are not used for that purpose. 
This formulation can be extended to in- 
clude ambivalence about a goal. In such a 
case, the body awareness might alternately 
represent the perception of preparation to 
act and then the determination to inhibit 
the action. 

5. Still another possibility is that the 
awareness of a body area might provide a 
devious way of partially satisfying a 
“forbidden” wish. Conceivably, an indi- 
vidual who had learned fear of direct 
sexual expression might gain some gratifi- 
cation from the persistent sensations asso- 
ciated with an apparently “uncontrolled” 
awareness of his genitals, even if anxiety 
were prominently involved. Many of the 
phenomena of hypochondriasis could fit 
this category. 

6. Wishes or fantasies which are for- 
bidden might arouse guilt and the expecta- 
tion that retaliatory damage will be 
applied to body areas involved in the satis- 
faction of the wishes. Body awareness in 
this instance would represent an anxious 
watching and guarding of an area to make 
sure that it did not get attacked or dam- 
aged. Freud’s concept of “castration anxi- 
ety” illustrates this possibility. 

7. Most speculative of all, as described 
in a previous paper (Fisher, 1959) is the 
view that an individual’s awareness of a 
body area might reflect his incipient at- 
tempts to “try out" body responses or 
modes of expression related to a wish. A 
person with repressed anger who for some 
reason became less blocked in this respect 
might develop increased awareness of mus- 
cles in his arms as he privately (uncon- 
seiously) rehearsed what it would feel like 
to hit someone. Or a person who was guilty 
about incorporation might develop height- 
ened awareness of his mouth as the result 
of greater freedom to “try out” the sensa- 
tions of “taking in” and biting. The "trying 
out” could actually be a way of deciding 
whether to go on to more open and active 
forms of the response. There is some simi- 
larity between this concept and formula- 
tions in which thinking is equated with 
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minute muscular movements in the throat 
and elsewhere (e.g., Washburn, 1916). 

If one considers the multiple ways in 
which body perception might become tied 
in with goals and wishes, it is clear that one 
is dealing with a system of high complex- 
ity. What general role might this system 
play in behavior? It has been suggested 
that there are specialized ways in which 
the awareness of a body region could com- 
plement a wish or conflict. For example, it 
might serve as a substitute for other kinds 
of body experienees or permit a covert 
"trying out" of new forms of wished-for 
expression. But in viewing the function of 
the system as a whole, it has been conjec- 
tured elsewhere (Fisher, 1965b) that it 
provides persisting signals that introduce 
selectivity in cognition and perception. 
The patterned awareness of certain body 
parts which have defined connotations over 
time may be regarded as an organized 
complex of peripheral cues which guide 
responses. A brief quote from a paper by 
Fisher (Fisher, 1965b, p. 539) summarizes 
this idea: 


The body scheme may be conceptualized as a 
representation in body experience terms of atti- 
tudes the individual has adopted. These are ex- 
periences coded as patterns of body activation 
(e.g, involving muscle, stomach). It may be 
presumed that the patterns of body activation 
exist as circuits based on the following sequence: 
perceptual focus upon a body area because of its 
utility or significance or activation in relation to 
a goal; increased physiological and also sensory 
arousal of the area as a consequence of its special 
prominence; further feedback from such arousal 
to the subsystem in the CNS involved in the 
original highlighting of the area. Thus, the indi- 
vidual's body scheme contains landmarks which 
reiterate to him that certain things are important 
and others are not. Just as a contracting stomach 
is a signal to seek food, the perceptual prominence 
of certain muscles maintained at high tonus may 
be a reminder to attend or not to attend to some 
class of objects. 


This formulation grew out of a series of 
studies in which it was demonstrated that 
intensity of awareness of specific body areas 
was predictive of selective cognition and 
memory for certain classes of information. 

In these terms, one may regard each of 
the body sectors in the present study which 
has shown a meaningful connection with 


personal attitudes as part of a system 
providing sensory cues which “feed back” 
to guide interpretation of the environment. 
The awareness of the back might serve to 
“remind” the individual that he must 
maintain self control, not “soil,” and avoid 
certain types of relationships with men. 
The prominence of the heart might re- 
peatedly signal the importance of behaving 
in a virtuous way and avoiding stimuli 
that are “bad” temptations, Or sensations 
from one’s eyes could function to inhibit 
approach to situations which would stimu- 
late oral incorporative wishes. The possi- 
bility that body sensations might function 
to modify perception and cognition in this 
fashion has been considered at length by 
Solley and Murphy (1960). They noted 
that Solomon and Wynne (1954) had 
found that avoidance conditioning in dogs 
was influenced by amount of visceral 
autonomie feedback. Solomon and Wynne 
had actually concluded that “at least, some 
of the afferent feedback impulses from 
the viscera have the properties of stimuli 
and so are capable of becoming conditioned 
stimuli and drive stimuli [p. 369].” Solley 
and Murphy indicated that proprioceptive 
feedback might function similarly; and 
proposed that a percept could get linked 
or locked to proprioceptive and autonomic 
feedback mechanisms “so that the ‘percept 
and the feedback mechanisms are mutu- 
ally excitatory [p. 243]." Illustratively, it 
was suggested that the continuous tighten- 
ing of certain muscles might act chroni- 
cally to inhibit specific anxiety arousing 
memories. “The painful memories are 
‘locked’ in a state of unawareness by the 
incessant feedback from the tightened 
muscles [p. 244].” Quite analogously, other 
investigators have found that cues de- 
rived from body sensations may influence 
perception and learning. Level of muscle 
tonus, asymmetry of tonus, position of 
body in space, body deformity, amount of 
autonomie arousal, and body sensations 
derived from drug effects have all proven 
to be variables that significantly affect cog- 
nitive processes (e.g., Belleville, 1964; 
Calloway & Dembo, 1958; Hinckley & 
Rethlingshaber, 1951; McFarland, 1958; 
Werner & Wapner, 1952). The body image 
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emerges in the model just described as a 
framework of meanings assigned to body 
areas which in turn are accompanied by 
sensory signals that have a selective impact 
upon perception and cognition. In assign- 
ing such importance to body signals out- 
side of the CNS, one swings back in the 
direction of peripheral theories of thought 
and affect which were supported vigorously 
at one time by Titchener (1924), Wash- 
burn (1916), James (1892), Jacobson 
(1929), Freeman (1948), Guthrie (1952) 
and others. While theories emphasizing 
central factors are currently more accepta- 
ble and in vogue, it has been observed by 
several reviewers (e.g. Gellhorn, 1964) 
that the contribution of peripheral factors 
has not yet been adequately evaluated. 


SUMMARY 


The major findings that have emerged 
are as follows: 

1. The greater a man’s focus of attention 
upon the right side of his body, the less 
active he is heterosexually as defined by 
his own self-reports; the higher is his level 
of anxiety about stimuli with threatening 
sex-role connotations; and the less free he 
is in expressing sexual ideas. It was not 
possible to show a consistent relationship 
between right awareness and anxiety about 
sexual themes as depicted by a measure de- 
rived from the Blacky Pictures. 

2. Degree of attention directed by the 
subject to the back of his body proved to 
be positively correlated with the following: 
his motivation for exercising careful con- 
trol in the expression of impulses; his 
tendency to convert hostility into nega- 
tivism; his level of anxiety about anal 
themes; and his popularity among peers 
as measured by his self-reports. The inten- 
sity of his back focus was negatively cor- 
related with his recall of how openly his 
father acted out anger, but the relation- 
ship was of a chance order with respect to 
his recall of his mother’s expression of hos- 
tility. The hypothesis was not supported 
that back awareness is negatively corre- 
lated with open aggressiveness. In general, 
the results accorded with the view that 
the front-back dimension is meaning- 


fully linked with Freud’s “anal character” 
typology. 

3. It was possible to demonstrate that 
back awareness is greater in paranoid 
than nonparanoid schizophrenics. This 
finding was seen as congruent with 
Freud’s formulation concerning the im- 
portance of homosexual conflict in para- 
noia. 

4. The hypothesis was supported that 
the greater the focus of attention on the 
back as compared to the front of the 
body the relatively higher will be the 
physiological activation of the first in re- 
lation to the second (defined by skin re- 
sistance). 

5. Intensity of eye awareness was nega- 
tively correlated with interest in eating 
and the selective tendency to recall food 
words. It was positively linked with de- 
gree of anxiety aroused by an oral sa- 
distic theme in the Blacky Pictures series. 
Relatedly, it was negatively correlated 
with how generous father was recalled as 
having been. A similar nonsignificant 
trend was found with reference to the re- 
call of mother’s generosity. The data cer- 
tainly suggest that one’s degree of eye 
awareness is directly connected with how 
much conflict one has about incorpora- 
tion. 

6. The prominence of the individual’s 
own body in his perceptual field also 
seems to be tied in with incorporative 
difficulties. Body Prominence scores proved 
to be negatively correlated with interest in 
food and number of foods liked and posi- 
tively so with amount of anxiety appar- 
ently aroused by the Blacky Oral Eroti- 
eism picture. However, they were not 
significantly correlated with reactions to 
the Blacky Oral Sadism picture or with 
recall of parental selfishness. 

7. There was good substantiation that 
heart awareness is positively correlated 
with the individuals religiosity and his 
perception of how religious his parents had 
been. It was also found to be related to 
difficulty in recalling words with guilt 
connotations, but was not correlated with 
response to the Blacky Guilt Feelings 
picture. It was erratically related to a 
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number of measures involving sexual 
attitudes and behavior. Finally, it was 
noted to be negatively correlated with 
aesthetic interests, as measured by the 
Study of Values. The findings were con- 
sidered to affirm the view that heightened 
concentration of attention upon one’s 
heart is part of an orientation which in- 
volves religiosity, narrowed perspective on 
the world, and increased guilt. 


8. The body-attention dimensions were, 
to an encouraging degree, able to predict 
selective perceptual responses to pictured 
themes presented in the Ames Thereness- 
'Thatness apparatus. 

9. The overall results indicated that the 
individual's manner of distributing atten- 
tion to his body is intimately related to 
the traits, confliets, and personality de- 
fenses characterizing him. 
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4 experiments investigated the influence of a given magnitude of reward as 
a function of S’s contemporary or previous experience with a different reward 
magnitude. The orthogonal variables studied included type of initial expe- 
rience with reward (consummatory versus consummatory-plus-instrumen- 
tal), response measure, apparatus, and intertrial interval. In addition to 
several points relevant to method, this research determined the following: 
(a) Latent learning of reward magnitude may be reflected as a simultaneous- 
contrast effect. (b) Reduction in reward associated with 1 stimulus may be 
accompanied by reduction in the rate of responding to another stimulus. 
(c) A Simultaneous-contrast effect exists when choice is between some and 


no reward. (d) A simultaneous-elation effect does not, appear corresponding 
to the simultaneous-depression effect. (e) Behavior following a shift in the 
magnitude of reward associated with a given stimulus may be determined in 
part by the magnitude of reward previously associated with another, dis- 


criminably different, stimulus. 


HE effect on behavior of a given con- 

dition of reinforcement depends upon 
the subject’s (S's) history with other or the 
same conditions of reinforcement. One con- 
vincing example of this has been the 
demonstration of "contrast effects" (CEs) 
following shifts in the magnitude of re- 
ward. These have been labeled “depres- 
sion" and “elation” effects by Crespi 
(1942) and are defined when the per- 
formance of Ss exposed to such shifts 
drops below or rises above the performance 
of controls exposed to only a single reward 
magnitude. 

Contrast effects generally have been in- 
terpreted from either of two theoretical 
frameworks. Crespi (1942), Bower (1961), 
and others have viewed them as emotional 
effects. Others (eg, Bevan, 1963) have 
considered a perceptual source of CEs. 
These interpretations, however, have been 
hampered by a lack of consistent data and 
a narrow range of response measures. The 
present experiments were part of a series 
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intended to add depth and breadth to the 
available relevant data. The strategy was 
similar to that of Spear (1964), Spear and 
Hill (1965), and Spear and Pavlik (1966) ; 
and the present experiments were directed 
specifically to a thorough examination of 
the paradigm for "simultaneous and sue- 
cessive contrast effects" (hereinafter re- 
ferred to as SimCEs and SucCEs). 

Spear and Hill (1965) considered two 
operationally distinct paradigms within 
the study of CEs of reinforcement condi- 
tions. In a SueCE, two successive stages of 
training differ only in terms of the condi- 
tions of reinforcement (in this case, magni- 
tude of reward) associated with a particu- 
lar stimulus-response (S-R) event. A SueCE 
is said to occur when performance during 
the second stage of training is inversely re- 
lated to the magnitude of reward experi- 
enced during the initial stage. Tests of the 
SimCE, on the other hand, compare per- 
formance to two discriminably different 
stimuli which are associated with differen- 
tial magnitudes of reward. In this para- 
digm, relative performance is measured 
when the stimuli are presented singly, and 
free choice is measured when the stimuli 
are presented simultaneously. A SimCE 
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is defined when performance to a given 
stimulus is inversely related to the reward 
magnitude associated with the alternative 
stimulus. 

These paradigms were combined in a 
single design by Spear and Hill (1965). 
Rats experienced a large and a small re- 
ward in the respective alternatives of a T 
maze during the SimCE test. Then, for the 
SucCE portion, the larger reward was re- 
duced. A SimCE was readily obtained, and 
the SueCE was numerically present in 
terms of running speed but not in choice 
behavior. 

There are several advantages of com- 
bining the tests for SimCE and SueCE 
within one experiment. First, the relative 
robustness of these effects can be com- 
pared. The particular objectives of the 
Spear and Hill experiments did not permit 
an entirely adequate comparison in this 
respect. Their finding of a relatively 
weaker SucCE may have been due to the 
prior experience of Ss with the Postshift 
reward magnitude and/or the accompany- 
ing SimCE. Indeed, either an emotion- or 
a perception-based explanation of CEs 
would predict this (see General Discus- 
sion). The present Experiment IV more 
adequately estimated the relative strengths 
of these CEs. 

Combining the SimCE and SucCE para- 
digms also aids in determining the extent to 
which a CE is stimulus specific. Thus a 
tentative decision becomes feasible regard- 
ing the pervasiveness of the CE: Is a great 
deal of S’s contemporary behavior affected 
by a particular occurrence of a CE, or are 
relatively disjoint portions of his behavior 
unaffected? The former would be the case 
if, following a decrease in the reward on 
one alternative, it were found that S’s 
performance declined in terms of responses 
other than those directly associated with 
the shift (for example, if running speed 
were also found to decrease in the alterna- 
tive in which reward is unchanged). This 
possibility, though unsupported by the 
data of Spear and Hill, is important in view 
of the response-ubiquitous partial-rein- 
forcement effect in extinction (Spear, 
1964; Spear & Pavlik, 1966). This latter 
phenomenon is defined by increased re- 


sistance to extinction of a formerly con- 
tinuously reinforced response as a conse- 
quence of S’s experience with some other 
partially reinforced event. Whether the CE 
is also ubiquitous throughout S’s behaviors 
can be determined by the combined test 
for SimCE and SueCE. 

Finally, a combined test can measure 
CEs in terms of choice behavior. For ex- 
ample, when S is presented with two 
nominally equal alternative reward mag- 
nitudes, will he ever show less preference 
for the alternative which formerly was 
associated with a larger reward? The CE 
in lower animals has typically been esti- 
mated via a vigor measure, but inertial and 
physiological limits of such a measure 
have created methodological difficulties 
(cf. Knarr & Collier, 1962; Spence, 1956). 
A preference measure avoids these prob- 
lems, and this act initially prompted the 
present use of the T maze. 

Three general objectives covered in the 
following four experiments, then, are (a) 
to estimate the relative magnitude of the 
SimCE and SucCE, (b) to determine the 
extent to which relatively dissociated be- 
haviors are affected by a reduction in re- 
ward, and (c) to further investigate the 
possibility of a SueCE in choice. Experi- 
ments I and IV concerned specifie variables 
believed relevant to characteristics of the 
SimCE and SueCE, and Experiments II 
and III were conducted to clarify points 
of methodology. 


EXPERIMENT I 


The first experiment was concerned 
specifically with this point: Will a SucCE 
occur in choice behavior? The available 
data would require a negative answer 
(Spear, 1964; Spear & Hill, 1964, 1965). 
However, it seemed likely that the specific 
procedures employed may not have maxi- 
mized the opportunity for this effect to 
appear. Of several problems, consider the 
following: 

The occurrence of a CE in choices would 
appear to be a joint function of certain 
temporal properties of the SimCE and 
SueCE. Assume that choice is dictated by 
relative response strength in the alterna- 
tives and that response strength is some 
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linear function of running speed. Now, 
` when S chooses between 12 and 1 pellets 
during Preshift and is shifted to 1 pellet 
on each alternative during Postshift, we 
know that two events will occur: response 
strength on the less favorable alternative 
(LFA) before the shift will be less than 
expected for 1 pellet, and response strength 
on the formerly more favorable alternative 
(MFA) after the shift will be about 
equally less than expected for 1 pellet. We 
also know that these response strengths 
wil ultimately adjust to the same ap- 
propriate level. Thus, if the SimCE and 
SucCEs are equal in magnitude, the criti- 
eal requirement for a CE in choices ap- 
pears to be that the response strength to 
the LFA must adjust more rapidly than 
that to the MFA following the successive 
shift in reward. In other words, at least 
one of two events must occur: either the 
SucCE must depress MFA responding 
more than the SimCE depresses LFA re- 
sponding, or S must recover from the 
SimCE sooner than from the SucCE. As 
a related consideration, note that the CE 
on choice would be defined when S re- 
sponds with less than 5096 preference for 
the originally MFA once reward is equated 
in the alternatives. For this to occur, the 
SucCE must either overcome the inertia of 
choosing the MFA (ef. Knarr & Collier, 
1962) or outlast it. The probability of the 
latter is minimized by the transitive na- 
ture of the CE (see Gonzales, Gleitman, & 
Bitterman, 1962), Thus it would seem that 
the greater the likelihood of overcoming 
the inertia of choosing the MFA, the 
more favorable the conditions of a CE in 
choice behavior, 

Therefore, the first experiment was per- 
formed within this general framework, 
and an attempt was made to minimize the 
SimCE while maximizing the SucCE. It 
was felt that this might be accomplished 
by giving S Preshift experience with the 
contrasting rewards associated with the 
distinctive stimuli but without having S 
respond differentially to the stimuli, except 
in terms of consummatory behavior. Thus, 
S would enter the Postshift stage without 
a history of differential running and choice 
behavior to the alternatives (including the 


typical SimCE) and without the strong 
initial tendency, or “response inertia,” to 
turn toward the MFA from the choice 
point on free trials. 

Therefore, in addition to the groups 
given conventional Preshift trials—Group 
12-1 (receiving 12 pellets on one alterna- 
tive and 1 on the other) and Group 1-1 
(receiving 1 pellet on either side)—two 
comparable groups were only placed in, 
but not run to, the different goal boxes 
during Preshift. During Postshift, Ss in all 
groups were given conventional trials with 
only 1 pellet in each alternative. To the 
extent that the discrimination was estab- 
lished during Preshift training, a SucCE in 
choice behavior was expected to occur in 
the "placed" Ss since they were not subject 
to interference from initial response tend- 
encies built up by choosing the MFA during 
Preshift. 


Method 


Subjects. The Ss were 64 experimentally naive 
female albino rats of the Sprague-Dawley strain, 
approximately 60 days old at the start of training 
and weighing 180-200 gm. 

Apparatus, The T maze, painted flat black ex- 
cept for the clear Plexiglass top, is shown in Fig- 
ure 1. The 1-ft. start box was separated from a l- 
ft. stem by a Plexiglas guillotine door. Each 
2¥%-ft, arm contained a food cup 8 in. from its 
end. At the termination of each arm was a parti- 
tion not quite as high as the top of the maze be- 
yond which was located a dish of reward pellets to 
equalize any olfactory cues. Guillotine doors, used 
to prevent retracing, were located in each arm, 
One such door was 1 in. beyond the choice point, 
and the other was 21 in. from the choice point (1 
ft. from the food cup). All interior sections were 4 
in, wide and 4 in. high. 

In order to make the alternatives more dis- 
criminably different, the floor and 1% in. of the 


Fic. 1. T maze employed in Experiments I, Il, 
and IV. PC — photocell; B — bowl of pellets in- 
cluded to mask odor; D — door; MS = micro- 
switch; F = food cup. 
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walls of each arm from the food cup to the center 
of the choice point was covered with an inter- 
changeable !4-in. Masonite insert. The presence of 
inserts necessitated a 1⁄4-in. “step up" as S entered 
the choice-point section. One set of inserts (al- 
ways located in the right arm in Experiment 1) 
was built with the rough surface of the Masonite 
facing up and was painted dark brown. The second 
set of inserts (always in the left arm) had the 
smooth surface up and was painted white. Thus 
textural confounded with brightness (or color) dif- 
ferences were provided. 

Times were recorded on Standard Electric Tim- 
ers in 01 sec. Response measures included stem 
time (from the raising of the start door to the in- 
terruption of a photobeam 6 in. past the start 
door), turning time (from the first photobeam to 
the second located 6 in. beyond the choice point 
in either arm), committed time (from the second 
photobeam to the third located 24 in. past the 
second and 3 in. before the food cup), and choices 
on free trials. Times were converted to speeds by 
a reciprocal transformation, and speeds are re- 
ported in ft/sec. 

The T maze was in a dimly lit cubicle which 
contained an exhaust fan providing masking noise 
as well as ventilation. The Ss were maintained, 
one to a cage, in a standard cage rack in the 
colony room under constant, bright illumination. 
When running the Ss, the experimenter (E) 
moved the cage rack into a dim passageway just 
outside the testing cubicle. On each trial, S was 
placed directly in the maze from its home cage 
and immediately returned at the completion. 

Procedure. Upon arrival, S was placed on a 
deprivation schedule which was maintained 
throughout the experiment, The S received daily 
10 gm. of finely ground Purina lab chow and 40 
45-mg. regular Noyes pellets with ad lib access to 
water. Pellets consumed in the T maze during 
training were subtracted from the total, and the 
remainder, along with chow, was given in the home 
cage 20-30 min. after the daily trials. On Days 2-7, 
each S was prehandled (placed for 3 min. in a 
large, black box, during which time S was lifted 
and replaced by E five times). 

The Ss were run in two replications of 32 Ss 
each, and each replication contained 8 Ss from 
each of the four groups. A 2 X 2 factorial design 
was employed, varying nature of Preshift trials 
and magnitude of reward on the MFA. During 
Preshift (the first 48 trials of training at 6 trials 
per day), Group 12-1-Run received 12 45-mg. pel- 
lets on the MFA and 1 pellet on the LFA. Group 
1-1-Run received one pellet on either alternative 
during Preshift, the designations MFA and LFA 
being randomly assigned on the same basis as in 
12-1. 

Both Groups 12-1-Run and 1-1-Run received 
their Preshift trials in a typical manner, being 
placed in the start box and allowed to run to a 
goal box. Groups 12-1-Placed and 1-1-Placed re- 
ceived Preshift reward conditions identical to 
Groups 12-1-Run and 1-1-Run, respectively, but 
did not run to their rewards. Rather, on each 


trial, they were placed in the appropriate arm, 
facing the food cup about 3 in. from it. The Ss in 
all groups were removed from the maze as soon as 
the reward was consumed, provided they remained 
a minimum of 15 sec. and no longer than 3 min, 
The order of placements for Placed Ss was random 
during Preshift with the stipulation that the six 
trials of each day contain three placements to the 
MFA and three to the LFA. On each trial, Run Ss 
were put in the start box facing away from the 
start door. After 3 sec, E raised the start door 
regardless of S's orientation. If S failed to stop a 
clock in 2 min, that and any other unstopped 
clocks were recorded as 2 min., and S was placed 
in the appropriate arm. If S made no choice on a 
free trial, he was randomly assigned to one arm. 
For the Run Ss during Preshift, Trials 1 and 5 of 
each day were free (S could enter either arm), 
and Trials 2 and 6 were forced (a closed door at 
the choice point prevented access to one arm) to 
the side opposite that chosen on the previous 
trial. Trials 3 and 4 of each day were forced ran- 
domly, one to each alternative, with the stipula- 
tion that half of the Ss in each group on each of 
Trials 3 and 4 be forced to the MFA and half to 
the LFA. Thus, equal experience to each alterna- 
tive was ensured. During Postshift (Trials 49-120, 
Days 9-20), all Ss were run as were the Run Ss 
during Preshift, with the exception that one 
pellet reward was available on either alternative. 

In assigning Ss to groups, the adjoining four 
cages in a cage rack contained one member from 
each group. These four Ss were run in rotation, 
resulting in a 3-4 min. intertrial interval. All such 
sets of four Ss were assigned the same MFA (left- 
White or right-Brown). Squads 1 and 2 in each 
replication were assigned left as the MFA as were 
Squads 7 and 8. Squads 3, 4, 5, and 6 of each rep- 
lication were assigned right as the MFA. Thus 
one-half of the Ss in each of the four experimental 
groups had left-White as the MFA, and one-half 
had right-Brown as the MFA. 


Results 


Nature of specific analyses. Throughout 
this report, differences in choice behavior 
will be evaluated in terms of the stratified 
chi-square test, and running speeds (re- 
ciprocal times) by analysis of variance. 
The precise form of each analysis will be 
described only if it is not obvious. For ex- 
ample, all analyses of variance included 
replications and brightness of the MFA as 
sources, but this will not usually be noted. 
Replications never contributed a major 
source of variance to modify the effects 
cited, and so it will not be mentioned fur- 
ther. The reliable effects of brightness did 
not alter the conclusions in Experiment I 
and so they will not be considered there; 
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however, they will be discussed in detail 
in Experiment II. 

Because of the large number of de- 
pendent variables, the Placed and Run 
groups are considered independently below. 
In addition, direct comparisons of the 
relative CEs following the Run versus the 
Placed condition are presented. 

The Run groups represented a repliea- 
tion of Experiment II by Spear and Hill 
(1965), except for slight differences in the 
number of trials per day, the apparatus, 
and the employment of discriminably dif- 
ferent alternatives. Accordingly, the results 
were virtually identical. The Ss in the 
present experiment ran uniformly slower 
on the average compared to the Spear and 
Hill results, but the relationships were 
the same. There initially was differential 
preference for the discriminably different 
alternatives, but this will be discussed in 
Experiment II. 

Preshijt. Choice probabilities, turning 
speeds, and committed speeds throughout 
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„Fia. 2, Preshift choices, turning speed, and com- 
mitted speed for “Run” Ss (ie., Ss given conven- 
tional instrumental experience during Preshift) in 
Experiment I. 
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Fia. 3. Postshift choices, turning speed, and 
committed speed for “Run” Ss (ie., Ss given con- 
ventional instrumental experience during Post- 
shift) in Experiment I. 
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Preshift are shown in Figure 2. Preference 
for the MFA was, of course greater by 
Group 12-1 than by Group 1-1, y*(2) = 
13.15, p < .005. 

'The results in terms of turning and com- 
mitted speeds were identical to those re- 
ported by Spear and Hill. On the last 4 
days of Preshift, turning speed was greater 
to the MFA but less to the LFA for Group 
12-1 relative to Group 1-1 (p < .001). The 
SimCE also occurred in committed speed— 
slower speed to the LFA was found by 
Group 12-1 compared to Group 1-1 (p < 
.05)—but committed speed to the MFA 
did not differ reliably between groups 
(F <.1). 

Postshift: Run groups. Choice proba- 
bilities, turning speeds, and committed 
speeds are shown for the Run groups in 
Figure 3. It may be seen that the basic 
results obtained by Spear and Hill (1965) 
were also replicated in the Postshift stage 
of training by Ss in the Run groups. In 
particular, there was no tendency for the 
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12-1-Run group to show less preference 
than the 1-1-Run group for the MFA at 
any point. Also as in the Spear and Hill 
study, the turning speeds of the 12-1 group 
gradually adjusted to the appropriate level 
with no indication of a SucCE on the MFA. 
The committed speeds of Group 12-1 also 
eventually adjusted to the level of the 
baseline controls, although again there was 
some tendency for a SucCE on the MFA 
immediately subsequent to the shift in the 
reward magnitude there. In fact, mean 
committed speed over the first 4 days of 
the Postshift stage was less in Group 12-1 
than in Group 1-1 on the MFA, F (1,24) = 
7.20, p < .02; but this fact is difficult to 
interpret in view of the spuriously slow 
speed on the MFA by Group 12-1 at the 
end of Preshift. In any case, this may be 
added to the data accumulated by Spear 
and Hill (1964, 1965), which suggest a 
weak but continually appearing SucCE in 
this situation, 

Postshift: Placed groups. The Postshift 
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Fic. 4. Postshift choices, turning speed, and 
committed speed for “Placed” Ss (ie. Ss given 
only consummatory experience during Preshift) in 
Experiment I. 


performance of the Placed groups shown in 
Figure 4 indicated that very little latent 
learning took place during the Preshift 
placements. Their speeds in the Postshift 
stage began at about the same rate as the 
speeds shown by the Run groups at the 
beginning of the Preshift stage. On the 
other hand, the choice data do suggest a 
slight preference for the formerly MFA in 
Group 12-1-Placed relative to Group 1-1- 
Placed during the first 4 days of Postshift, 
although this difference did not quite at- 
tain statistical reliability, x*(2) = 4.07, 
p< .15. 

It is elear in Figure 4 that the differen- 
tial reward magnitude during the Preshift 
stage made little difference in terms of 
running speeds, with the exception of com- 
mitted speeds to the LFA. It is notable that 
this SimCE, which was established during 
the Preshift placements, provided the only 
statistically reliable evidence that Preshift 
reward experience had an effect, shown by 
the fact that the speed to the LFA re- 
mained slower at this point for the 12-1 
group relative to the 1-1 group, F(1,24) — 
6.11, p « .025. 

Postshift: Run versus placed groups. 
The combined results with the Placed and 
the Run conditions would seem to dictate 
one major conclusion: the tendency toward 
SucCEs in terms of running speed is more 
likely to occur following instrumental ex- 
perience with the reward magnitude than 
following only consummatory experience. 
In fact, in every instance of a comparison 
in terms of running speeds, the SucCE was 
at least numerically greater for the Run 
conditions. This is consistent with data 
produced by Goodrich (1962), Goodrich 
and Zaretski (1962), and Spear (1965a) 
and with other unpublished runway data 
from our laboratory. 

Consider the committed speed on the 
MFA during Days 14 of the Postshift 
stage (see Figure 5). The analysis revealed 
a statistically reliable SucCE overall, 
F(1,56) = 4.77, p < .05; and, as expected, 
the Run groups had greater mean speed 
overall than the Placed Groups, F(1,56) = 
127.66, p < .001. The critical finding in 
this case was the interaction between re- 
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ward magnitude during Preshift and the 
Preshift treatment, F(1,56) = 3.98 (F > 
4.02 is required for p « .05). In particular, 
the committed speed to the MFA was less 
for the 12-1 than the 1-1 groups in the Run 
conditions but was about equal in the 
Placed conditions. Thus the SucCE oc- 
curred in the Run conditions but not in the 
Placed conditions. 

On the other hand, the SimCE, meas- 
ured on the LFA as it carried over into 
Days 1-4 of the Postshift stage, did not 
vary between the Run and Placed condi- 
tions (see Figure 6). Overall, speeds to the 
LFA were less for the 12-1 groups than for 
the 1-1 groups at this point, F(1,56) = 
4.92; p < .05, and speed was greater for 
the Run groups, F(1,56 = 99.14, p < 
001. In contrast to the SueCE—and this 
is the critical point—there was no trace 
of an interaction between these conditions, 
F(1,56) « 1. That is, the carry-over of 
the SimCE into Postshift occurred about 
equally whether the Ss had received in- 
strumental or only consummatory Pre- 
shift experience. It is noteworthy that 
these Placed groups represent the only 
clear evidence that the SimCE may be 
greater than the SueCE. 

In terms of choice behavior during the 
first 4 days of the Postshift stage, there 
were no statistieally reliable differences in 
the effects of reward magnitude for the 
Run versus Placed conditions. In particu- 
lar, there was no interaction between re- 
ward magnitude during Preshift and Run 
versus Placed conditions, x? (2) = .77. This 
same analysis showed that the 12-1 groups 
preferred the formerly MFA with greater 
frequency than did the 1-1 groups, x*(2) 
= 10.75, p « .01, and greater choice of 
the MFA was found in the Run groups, 
overall, than in the Placed groups, x*(2) 
= 10.75, p < 0L 

In absolute terms there was really no 
evidence for a SueCE in terms of choice 
behavior within either the Run or the 
Placed conditions. In no case did Ss pre- 
fer the formerly LFA during Postshift. 
However, with the performance of Group 
l-l as the baseline rather than absolute 
Preference, the relationship between the 
choice behavior of Ss in the Placed and the 
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Run conditions may be more reasonably 
compared. Recall that this experiment, was 
originally designed with this comparison 
in mind. It was expected that more evi- 
dence for a SucCE in terms of choice be- 
havior would be obtained in the Placed 
than in the Run conditions. The question, 
then, is whether the relationships between 
the 12-1 and 1-1 conditions differed at any 
point in Postshift for the Placed versus the 
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maze during Preshift. 
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Run conditions. The answer would be 
“yes” if a significant interaction were ob- 
tained between reward magnitude and Pre- 
shift training conditions. Therefore, a chi- 
square test of number of correct choices 
was performed over the last 4 days of the 
Postshift stage. It was found that the pre- 
dicted interaction did occur, x? (2) = 4.03, 
p « .05, in the direction that the 12-1 con- 
dition ehose the formerly MFA more often 
than the 1-1 condition within the Run 
groups, but the opposite was true within 
the Placed groups. However, this latter 
tendency toward a SucCE in terms of 
choice behavior within the Placed groups 
is spurious. As can be seen in Figure 4, 
this relationship within the Placed groups 
is not the result of less preference for the 
formerly MFA in Group 12-1 but rather it 
is a consequence of greater preference for 
the "dummy" LFA shown by the 1-1 
group. Clearly not a great deal of weight 
can be placed on this finding, and it must 
be concluded that a SucCE in choices did 
not occur. 


Discussion 


Three major points of information were 
provided by this experiment. First, the 
SueCE in terms of choices was not ob- 
tained. Whether the Ss had been placed or 
run during their Preshift experienee did 
have some effect on the tendency toward a 
SucCE in choices, but the effect was mar- 
ginal at best. Second, the SimCE again 
demonstrated its robustness by appearing 
equally whether the Ss had had instrumen- 
tal-plus-consummatory, or only consum- 
matory, experience with the differential 
reward magnitudes prior to the reward 
shift. Third, although the SueCE in run- 
ning speed was again mildly present after 
conventional Preshift experience, it was 
weakened and essentially erased when 
Preshift experience included only consum- 
matory activity. Each of these three re- 
sults is discussed below. 

Contrast effects in choice behavior. 
Within the groups given conventional in- 
strumental experience during the Preshift 
stagé, there was no evidence that the 12-1 
group ever preferred the formerly MFA 
with less frequency than did the 1-1 group. 


Had this occurred, it would have defined 
the SueCE in choice behavior. Including 
the two experiments reported by Spear 
and Hill (1965) and Experiments II and 
IV of the present report, there is now a 
total of five experiments in which the Suc- 
CE in choice behavior has not appeared 
following conventional Preshift training. 

It is true that in the latter stages of the 
Postshift experience, the SueCE was nu- 
merically defined in the Placed groups but 
not in the Run groups, and this interaction 
was statistically reliable. However, the 
above-chance preference for the MFA by 
the 1-1 Placed group limited the implica- 
tions of this fact. Moreover, it is also sus- 
picious that this CE in choice behavior 
should not occur until so late in the Post- 
shift stage, a point at which the CEs in 
speed have typically disappeared. Finally, 
it seemed strange that the “contrast ef- 
fect” in choice behavior was not accom- 
panied by the typically more sensitive CE 
in running speed within the Placed groups. 

All things considered then, it was con- 
cluded that a CE in terms of choice be- 
havior had not been demonstrated within 
this experimental paradigm. It may be 
possible to obtain such an effect on a 
position discrimination task by increasing 
drastically the number of Preshift trials 
(cf. Birch, 1964; Vogel, Mikulka, & 
Spear, 1966), or perhaps by using a visual 
discrimination task. These possibilities are 
currently being pursued but will not be 
considered in the remainder of this report. 

Contrast effects in running speeds. The 
occurrence of the persisting SimCE subse- 
quent to only consummatory experience 
during the Preshift stage—an instance of 
“latent learning" of reward magnitude— 
was important for several reasons. First, 
this represented the only known occurrence 
of a CE in running speeds subsequent to 
this limited kind of experience with the 
differential rewards. All the previous ex- 
periments that have attempted to show a CE 
subsequent to initial consummatory ex- 
perience with the particular reinforcement 
(previously only the SucCE had been 
tested in this way) have failed to do so 
when a running-speed measure was em- 
ployed (e.g., Goodrich, 1962; Spear, 1965a). 
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Of course, such CEs apparently are readily 
obtained with a bar-press response measure 
(Collier & Marx, 1959). 

Second, this fact was especially impor- 
tant in relation to interpretations of CEs 
which require instrumental experience dur- 
ing the Preshift stage (cf. Pereboom, 
1957). The finding of a CE after only 
consummatory experience with the rewards 
requires that theoretieal emphasis be 
placed squarely on the stimulus properties 
of the reward itself, whether preingestive 
or postingestive. 

Finally, it was important that the 
SimCEs occurred equally in the Placed 
and Run conditions, but that the SueCE 
did not. The weak trace of the SucCE 
shown in the Run conditions was com- 
pletely absent in the Placed conditions. 
Although the implications are not entirely 
clear, this fact, does suggest the possibility 
that the SucCE is governed by processes 
that are different from those responsible 
for the SimCE (see Discussion of Experi- 
ment IV). 

One explanation of the lesser SucCE 
when only consummatory experience was 
given during the Preshift stage might em- 
phasize the “memory,” or in Capaldi’s 
(1963) language, the “aftereffects,” of the 
Preshift reward magnitude. It is clear that 
the occurrence of the SueCE is strongly de- 
pendent upon the retention of the afteref- 
fect (or at least some representational 
response) of the Preshift reward. It may 
be that the rate of the running response 
itself during Preshift is an important com- 
ponent of this aftereffect. Thus, when only 
Preshift consummatory experience is given, 
this component is absent and the afteref- 
fects of the Preshift reward are less avail- 
able for comparison during the Postshift 
stage. It should be clear that the retention 
requirement is not as great in the SimCE 
as in the SucCE paradigm. 

Although Experiment I demonstrated 
the effect of type of Preshift experience 
and replicated the basic phenomena ob- 
tained by Spear and Hill, there were three 
features of methodology that remained to 
be clarified. One of these features was the 
differential preference for the Brown ver- 
sus the White alternative and its effect on 


CE phenomena: Does the effect of a shift 
in reward interact with S’s operant level 
of responding? The second was a trouble- 
some feature that pops up from time to 
time in this kind of experiment, which we 
labeled the “tracking” phenomenon. The 
third, tested in Experiment III, concerned 
the possibility that the SimCE might be 
an artifact of the T maze. 


EXPERIMENT II 


A primary reason for this second experi- 
ment was to clarify two points regarding 
methodology. In doing so it was necessary 
to closely replicate the experimental con- 
ditions of the Run groups in Experiment 
I. First, certain potentially interacting ef- 
fects concerning the differently appearing 
alternatives appeared interesting enough 
to warrant a closer look at these phenom- 
ena with an increased sample, thus pro- 
viding a more powerful test. Second, in the 
experiments by Spear and Hill (1965), in 
Experiment I of the present studies, and in 
several other investigations from our labo- 
ratory, it had been noted that control 
groups not shifted in reward magnitude 
tended to behave as if they, too, had been 
shifted along with the experimental groups. 
They tended to behave as did the experi- 
mental rats immediately preceding them in 
the maze, and this behavior we labeled 
"tracking." 

We had been aware of this latter possi- 
bility for some time and had taken meas- 
ures to guard against the occurrence of E 
bias and systematic E errors, And, of 
course, the baseline control groups, such as 
Group 1-1, were always included instead 
of depending upon an absolute baseline. 
Still, there remained a needling tendency 
for the Group 1-1 controls to increase their 
probability of choosing the arbitrarily de- 
termined MFA when, for example, the ma- 
jority of Ss in the 12-1 group chose it. This 
was most apparent when every S in a given 
rotation was assigned the same side of the 
T maze as its MFA. Naturally, there was 
always an equal number of Ss from each 
experimental condition represented within 
a given rotation, so the conditions would 
be equally affected. To the extent that all 
Ss in a given rotation had a common MFA, 


10 Norman E. Spear AND JosEPH H. SPITZNER 


however, a sort of tracking phenomenon 
occurred. Tracking may have been exhib- 
ited in Experiment I of the Spear and Hill 
paper by the tendeney for Group 1-1 to 
choose the arbitrarily designated MFA 
with a probability greater (numerically) 
than chance during the Preshift stage. It 
also may have appeared during the Post- 
shift stage in Experiment II of that paper, 
as Group 1-1 decreased their choice of the 
arbitrarily designated MFA at about the 
same rate as Group 12-1. 

Now it should be emphasized that if 
tracking did occur, it in no way confounded 
the conclusions, the reasonable assumption 
being that the effect of tracking is uniform 
over experimental conditions. All condi- 
tions were initially assigned their MFA in 
the same way, and baseline controls were 
always used. 

The occurrence of this type of phenom- 
enon is usually dismissed as a chance fac- 
tor; indeed, the indications of tracking 
found in the Spear and Hill experiment 
did not attain statistieal reliability. How- 
ever, ethologists readily accept the possi- 
bility that rats may communieate via odors 
left in the maze and which may “help an 
animal to remember its way about" (Bar- 
nett, 1963, p. 31, p. 78). The intention of 
the present experiment was to maximize 
the possibility of such tracking behavior, 
but to restriet its source to odors more 
subtle than the occurrence of urine, feces, 
and mere number of rats that had preceded 
S down a particular path; thus, the present 
procedure included both the careful re- 
moval of urine and fecal traces, and the 
approximate equating of the number of 
previous rats that had gone to either al- 
ternative prior to a given S’s test. The 
critical difference was to be that half of 
the previous Ss went to the LFA and half 
to the MFA. The question then was not 
just whether rats would follow the path of 
other rats, but whether they would follow 
them differentially to the MFA versus the 
LFA, 


Method 


Subjects and apparatus. The Ss were 32 naive 
female albino rats of the Sprague-Dawley strain, 
approximately 60 days old at the start of pre- 
handling and weighing 180-200 gm. The apparatus 


and response measures were the same as those 
used in Experiment I, with the exception that 
there were two sets of colored inserts (discussed 
below). 

Procedure. Maintenance, deprivation, and pre- 
handling conditions were identical to those of Ex- 
periment I. The Ss were fed a daily ration of 10 
gm. of finely ground Purina lab chow, supple- 
mented by 40 .045-gm. regular Noyes pellets 
(minus the number received in the T maze each 
day), in their home cages a minimum of 10 min. 
after the daily training. Water was always avail- 
able. 

The Ss were randomly assigned to one of two 
groups and one of four rotations. As in Experi- 
ment I, Group 1-1 was run to 1 pellet on either 
alternative throughout the experiment, and Group 
12-1 was run to 12 pellets on the MFA and 1 pellet 
on the LFA during Preshift, and to 1 pellet on 
either alternative during Postshift. Each rotation 
was defined in terms of the color and location of 
the assigned MFA and included four Ss from each 
group. For two of the rotations assigned Brown as 
the MFA, one set of colored, T-maze arm inserts 
was employed; for one of these rotations, Brown 
was on the left, and for the other, it was on the 
right. For the other two rotations, White was as- 
signed as the MFA, and the second set of inserts 
was employed; for one of these rotations, White 
was on the left, and for the other, on the right. 
Since it was hypothesized that rats leave differen- 
tial spoors in accord with the “attractiveness” of 
the alternative, it was felt that the strength of 
these traces might summate across trials; that is, 
as more members of Group 12-1 ran to a given 
MFA and LFA, the differential spoors left behind 
by these Ss would accumulate. 

As in Experiment I, all Ss were given six trials 
per day throughout the experiment. Trials 1 and 
5 of each day were free trials, and Trials 2 and 6 
were forced to the side opposite that chosen on 
the preceding trial. Trials 3 and 4 were forced, 
one to either alternative, with the stipulation that 
within each rotation and on any trial, half of the 
Ss in each group were forced to one alternative 
and half to the other, 

In order to detect tracking in Group 1-1 Ss, 
a systematic running procedure was followed 
throughout training: each rotation was run as & 
unit, and several minutes separated the running 
of successive rotations. On each day, the four 
members of Group 12-1 in each rotation were 
given their first two trials before any members of 
Group 1-1 were run. Thus, each of the Group 12-1 
Ss ran one trial to each alternative before Group 
1-1 Ss entered the apparatus. This presumably 
maximized the presence of differential spoors, 
while equating the number of rats which traversed 
each alternative. For the next four trials, all eight 
Ss in the rotation were run in succession as fol- 
lows: first there were two Ss run from Group 1-1, 
then four Ss from Group 12-1, then two Ss from 
Group 1-1. The same Ss were run in the same 
order throughout training. Following these trials, 
the four Ss in Group 1-1 were given their final 
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two trials in rotation. It was believed that the 
final two Ss in Group 1-1 would show greater evi- 
dence of tracking than the first two, since on suc- 
cessive trials, they would immediately follow the 
Group 12-1 Ss. The first two Ss from Group 1-1 
would, on trials after the first, receive their trials 
following the final two Ss in Group 1-1. This proce- 
dure also insured that preceding a trial given an 
S from Group 1-1, an approximately equal number 
of rats had experienced either alternative, In addi- 
tion, urine and feces were removed from the maze 
after each trial. Thus, cues based on differential 
number of previous Ss and upon differential urine 
and feces (which could indicate emotional re- 
sponses) were not available to Ss from Group 1-1. 

As in Experiment I, Ss were run for 8 days of 
Preshift and 12 days of Postshift. It should be em- 
phasized that the present design did not maxi- 
mize the possibility of obtaining evidence for 
tracking behavior. The design was restricted by 
the additional aim of replicating and clarifying 
the result obtained in Experiment I relevant to 
the effects of color of alternative. Had the hy- 
pothesized "tracking" phenomenon been of major 
interest, a more sensitive test of its existence 
might surely have been devised. 


Results 


No convincing evidence for tracking, as 
defined here, was found, Several analyses 
were employed as tests of this phenomenon 
(within the limited design employed), and 
they all yielded negative results. Only a 
few of these analyses need be noted here 
as examples of the kinds of tests possible 
from this experiment. 

First there were direct tests in which the 
absolute performance of Ss in Group 1-1 
could be inspected in terms of choice of the 
MFA (ie, that alternative of the maze 
which was the MFA for the Ss in Group 
12-1 run in the same rotation as the respec- 
tive Ss in Group 1-1 relative to the LFA). 
It was found that Ss in Group 1-1 chose 
both alternatives with about equal fre- 
quency during the Preshift stage and thus 
showed no tendency toward the 12-1 MFA. 
Moreover, during the Postshift stage there 
appeared no tendency for Ss in the 1-1 
group to decrease their frequency of choos- 
ing the formerly MFA in accord with the 
behavior of the Ss in Group 12-1 (as ap- 
parently had been the case in the Experi- 
ment IT of the Spear-Hill paper). In terms 
of running speed, Group 1-1 yielded no 
tendeney during the Preshift stage toward 
faster turning or faster committed speed in 
the MFA than in the LFA. There oc- 


curred a slight tendency toward a decline 
in MFA turning speed during Postshift 
which was somewhat suggestive of a track- 
ing effect in Group 1-1, but the lack of any 
similar effect in the committed speed made 
this result quite spurious. In terms of the 
relative effect of tracking on those Group 
1-1 Ss run before the Group 12-1 Ss, com- 
pared with those Group 1-1 Ss run after 
the Group 12-1 Ss in a given rotation, there 
were again no reliable signs that tracking 
was contributing variance. 

Second, recall the expectation that to the 
extent that tracking did occur, it would 
most strongly influence those Ss in Group 
1-1 which were run in the rotation imme- 
diately following the four Ss in Group 
12-1. None of the several analyses sug- 
gested this occurrence. 

One other set of tests evaluated per- 
formance of Group 1-1 Ss, expecting that 
any influence of tracking should have 
been more apparent during the later trials 
of the daily session than during the initial 
trials, These analyses also yielded no posi- 
tive evidence for tracking. 

Thus it is concluded that when the total 
number of rats running in either alterna- 
tive is approximately equated and when 
urine and feces are removed from the 
maze, the performance of a given S is not 
seriously affected when run in the same 
rotation as other Ss which have a common 
MFA and LFA. 

Replication of past results. The choice 
behavior and running speeds measured in 
this experiment agreed with those obtained 
in previous experiments on SimCEs and 
SucCEs. The statistical analyses confirmed 
this replication in terms of all essential 
facts. The absolute values agreed nearly 
completely with those of Experiment I 
with the exception of a slight, though uni- 
form, increase in MFA speeds by Group 
12-1 of Experiment II. Therefore, there is 
no need to repeat the statistical particulars 
here. 

The effect of brightness of alternative. 
It is a fact that in an apparatus such as 
was used in Experiments I, II, and IV of 
this report, albino rats have an initial 
(operant) preference for the darker of the 
alternatives. Is the effect of a reduction in 
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reward the same regardless of S's operant 
level of performance? By assigning alter- 
natives equally as MFA and LFA and 
combining data from comparable groups 
in Experiments I and II, a large sample 
could be used to provide an answer. 
Specifically, 32 Ss from each of Groups 12-1 
and 1-1 were considered; half had been 
randomly assigned White as MFA and 
half Brown. 

The particular question concerned the 
relative change in behavior in an alterna- 
tive, concomitant with a change in condi- 
tions of reinforcement, when that alterna- 
tive was initially the preferred, compared 
with when it was initially the unpreferred, 
alternative. For example, would S more 
rapidly decrease its response rate and/or 
preference in an initially wnpreferred al- 
ternative when the magnitude of reward is 
reduced in the alternative in question? 

Preference for the Brown alternative. 
The first trial in the T maze was a free 


trial, and 61% of all Ss chose Brown on 
this trial. This preference did not decrease 
with succeeding experiences in the two al- 
ternatives, even when both alternatives 
were rewarded equally. In fact, preference 
for the Brown increased with training. 
During the last 2 days of the Preshift 
stage, Ss from Group 1-1 were choosing 
Brown on 67% of the free trials. Through- 
out the Postshift stage, 27 of the 32 Ss 
from Group 1-1 showed an overall prefer- 
ence for the Brown alternative. Averaging 
across Postshift trials, it was found that 
Ss from Group 1-1 chose the Brown alter- 
native 73% of the time. These choice data 
may be seen in Figure 7. 

Effect of brightness of alternative on 
conclusions concerning changes in behav- 
ior. When reward magnitude was reduced 
in the MFA, choice of the MFA by Ss in 
Group 12-1 was more greatly reduced when 
that alternative was White than when it 
was Brown (see Figure 7). This was con- 
firmed statistically by a Mann-Whitney 
U test on the differences between the num- 
ber of MFA choices by Group 12-1 during 
the last half of Preshift and the number of 
formerly MFA choices by this group during 
the first half of Postshift, U = 38, p < 
001. Because Group 1-1 had been included 
to provide the appropriate baseline, it may 
be seen that the greater change in choice 
behavior when White was the MFA is 
simply a consequence of the fact that 
these Ss required a greater behavioral 
change to adjust to their baseline control 
(100% to 25%) than did Ss with Brown 
as the MFA (from 100% to 75%). Of 
course, we may not be dealing with an 
equal-interval scale here and probably are 
not. Nevertheless, it does appear that if 
one were to extrapolate the curves in Fig- 
ure 7, both subgroups of Condition 12-1 
would have adjusted to the level of the 
baseline control at about the same point in 
the hypothetically extended Postshift stage. 

These facts are important in view of the 
erroneous conclusions that could result if 
the appropriate control groups had not 
been included. For example, if the baseline 
had not been established by Group 1-1, the 
more rapid change in choice behavior by 
Group 12-1 Ss with White as the MFA 
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might have taken on quite a different 
meaning. Or, had only Group 12-1 been 
included with Brown as the MFA, one 
might have concluded that choice behavior 
is relatively impervious to change as a con- 
sequence of a reduction in reward in one 
alternative so long as reward remains in 
both alternatives. 

The question of relative change in be- 
havior as a function of color of alternative 
may also be asked in terms of running 
speed measures. In certain respects, run- 
ning speed is more sensitive to changes in 
reward magnitude than is choice behavior 
in this situation (Spear & Hill, 1965). 

Figure 8 shows the course of running 
speed during Preshift for Ss having White 
as the MFA and for those having Brown 
as the MFA. Without going into detail, it 
may be said that no reliable interactions 
occurred between brightness of the MFA 
and reward magnitude. This was true in 
terms of both the rate of increase in speed 
during the first few days of Preshift and 
in terms of asymptotic speeds at the end 
of Preshift. 
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Brightness of the MFA and LFA did, 
however, determine the adjustment of run- 
ning speeds subsequent to the reduction in 
MFA reward from 12 pellets to 1 pellet. 
Turning speeds during the Postshift stage 
are shown in Figure 9 as a function of 
these variables. The top half of the figure 
shows the decrease in MFA turning speed 
by the 12-1 groups in relation to the base- 
line: here the baseline is defined in terms 
of the 1-1 control groups. The lower half 
of the figure again shows the decreasing 
MFA speeds by Group 12-1, but in this 
case the baseline is the corresponding LFA 
speeds of this group. 

In view of the control groups and coun- 
terbalancing employed, it may be seen that 
the interacting effects of brightness with 
reward magnitude are not really serious 
for our purposes. However, it is clear in 
Figure 9 that the precise conclusions de- 
rived from an experiment of this kind 
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could be greatly influenced by the bright- 
ness of the MFA—or any similar factor 
creating differential operant preference— 
particularly if it were not balanced across 
conditions. For example, had only the 
White alternative been employed as the 
MFA in the top figure, one would have 
concluded that adjustment in turning 
speeds is completed rather rapidly—by the 
fifth and sixth days of the Postshift stage. 
On the other hand, had only the Brown 
alternative been employed as the MFA, 
the conclusion would have included some 
doubt as to whether speeds on the MFA 
would ever adjust to the baseline, since 
they gave no indication of doing so even 
after 72 Postshift trials. In terms of the 
final level of adjustment of turning speed 
to the MFA (last half of the Postshift 
stage), brightness of alternative did not 
reliably interact with Preshift reward, 
F (1,60) = 1.69, p > .10. 

The bottom of Figure 9 illustrates the 
same principle but with the added com- 
plication that the baseline is defined in 
terms of the performance on the LFA by 
Ss in Group 12-1. Since running speed to 
the LFA increased during the Postshift 
stage as a consequence of its recovery from 
the simultaneous depression effect, it clearly 
provides an inappropriate baseline for the 
definition of the SueCE. In this case, con- 
clusions based upon turning speed during 
the last half of Postshift are clearly biased 
by brightness of MFA. A mixed analysis 
of variance (Brightness x Preshift re- 
ward) revealed a statistically significant 
interaction, F(130) = 6.02, p < 025, 
which reflected the convergence of MFA 
and LFA speeds for Ss with the White 
MFA, in contrast to the continued separa- 
tion of these speeds when the former MFA 
was Brown. These same general changes in 
behavior were also obtained in terms of the 
committed speeds, occurring more rapidly 
than those of turning speed. 


Discussion 

Experiment II yielded information rele- 
vant to: (a) the influence of tracking on 
behavior in the present T-maze situation; 


(b) replication of facts concerning CEs in 
a position discrimination, and (c) the 


necessity for including baseline control 
groups and for counterbalancing T-maze- 
alternative characteristics among MFA 
versus LFA assignment in this kind of re- 
search. These factors are discussed briefly 
below. 

a. It was found that if such tracking 
does exist in the present situation, it is not 
an important source of variance. No evi- 
dence for tracking could be obtained when 
care was taken to remove obvious signs 
(feces, urine) from the maze and when a 
trial for a given rat had been preceded by 
an approximately equal number of trips to 
each alternative by previous rats in the 
maze. 

b. The major facts obtained previously 
in experiments on the SimCEs and SucCEs 
were replicated. These included the find- 
ings of a reliable SimCE in running speed, 
a numerical but weak SucCE in running 
speeds, no SucCE in choices, and eventual 
adjustment to the level of the baseline 
control, in terms of all response measures, 
following the reward shift. 

c. It was shown that the lack of inde- 
pendent control groups necessary to estab- 
lish a baseline and the failure to balance 
out the initially preferable alternative 
when assigning the MFA and LFA could 
result in inappropriate conclusions con- 
cerning adjustment of behavior following a 
shift in the conditions of reinforcement. In 
particular, it was shown that if the initially 
less preferable alternative is the MFA and 
the more preferable alternative is the LFA, 
behaviors in these alternatives converge 
much more rapidly once the alternative 
reinforcement conditions are equated than 
when assignment of MFA and LFA is re- 
versed. In the latter situation, there is 
relatively little change in behavior even 
after 72 Postshift trials. 


Experiment III 


This experiment, as Experiment II, was 
concerned primarily with problems of 
method and interpretation. There were 
still three points regarding the SimCE and 
SucCE paradigm that needed clarification. 
These points required an experiment which 
tested for SimCEs and SucCEs of reward 
magnitude, but with alternative stimuli 
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which were relatively independent. When 
the T maze was used, experience with one 
alternative excluded the possibility of ex- 
periencing the other. This was particularly 
true on free trials, and it could conceivably 
influence behavior on forced trials. In Ex- 
periment III, this was changed by using 
two distinctive runways (one White and 
one Black). Thus, only one possible “al- 
ternative” existed on each trial. Each rat 
experienced both runways. During Pre- 
shift, one runway was associated with the 
larger reward, the other always with the 
smaller reward; and both runways were 
associated with the smaller reward during 
Postshift. 

The first point of interest was whether a 
SimCE could be measured during Preshift. 
Would Ss run slower for the small reward 
in one runway when the larger reward was 
presented in the other runway than if the 
same small reward was available in both 
runways? If the SimCE did not exist in 
such a situation, one could always argue 
that its occurrence in the T maze was an 
artifact created on forced trials to the 
LFA in the 12-1 groups. Perhaps the at- 
traction to the alternative MFA acted to 
"pull S back" from the LFA rather than 
to slow his progress to the LFA per se as 
is implied by the term “contrast effect.” 
Other investigators have conducted studies 
corresponding to the Preshift stage of 
such an experiment, Goldstein and Spence 
(1963) found no evidence for such a phe- 
nomenon, but Bower (1961) and Bower 
and Trapold (1959) did. Because of this 
disagreement, a test employing our condi- 
tions seemed desirable. 

The second point concerned the SucCE. 
Recall that the SueCE as measured in the 
present paradigm is weak and may lack 
Statistical reliability within a given experi- 
ment. Perhaps some aspect of the T-maze 
apparatus was limiting its effectiveness. 
This possibility, therefore, was investi- 
gated by including the SucCE paradigm 
using the present dual runway apparatus. 

Finally, the interacting effect of bright- 
ness of alternative was considered. It was 
Possible that when the Black and White 
alternatives were experienced separately 
in the double runway situation, the effect of 


this variable might be relatively slight in 
comparison with the T maze in which S 
may compare Black and White simul- 
taneously. Although Experiment III did 
not provide a rigorous test of this possibil- 
ity, a rough estimate of the influence of 
Black versus White as an absolute source 
of variance was obtained. 


Method 


Subjects. The Ss were 28 female albino rats of 
the Sprague-Dawley strain approximately 65 days 
old at the start of experimental training. All Ss 
had been run by a different E in the same two 
runways prior to the present training under simi- 
lar conditions of reinforcement with the exception 
that all Ss had been differentially reinforced, re- 
ceiving 10 pellets on the MFA and no reward on 
the LFA, following an initial series of rewarded 
trials in both alleys and a series of nonrewarded 
trials in both alleys. Approximately 20 trials had 
been given each S prior to the present training, 
which began 2 days after the last trial under the 
prior conditions. In the present study, all Ss were 
assigned the same MFA as in the initial training. 

Apparatus. The testing apparatus consisted of 
two Hunter runways which, briefly, consisted of a 
start box 5 in. wide and 1 ft. long; an alley 4 in. 
wide and 33 in. long; and a goal box 5 in. wide 
and 1 ft. long. All sections were 4 in. high and 
constructed of Masonite, except for the top and 
sides of the alleys and goal boxes which were 
Plexiglas. Raising the guillotine start door, which 
separated the start box from the alley, started a 
Standard Electric Timer which was stopped by 
the interception of a photobeam located 4 in. be- 
yond the start door. Interruption of the first beam 
started a second clock which was stopped by the 
interruption of a second beam located 7 in. inside 
the goal box door separating the goal box from 
the alley and 3 in. before the food cup which was 
located at the rear of the goal box. One of the 
runways, designated White, was painted white 
throughout, except for the top, which was clear. 
The other runway, designated Black, was painted 
black throughout, except for a ¥%-in. strip along 
the top of the alley and goal-box portions of the 
runway. 

Procedure. The Ss were maintained on ad lib 
water, 30 45-mg. regular Noyes pellets, and 10 gm. 
of finely ground Purina lab chow daily. The pel- 
lets (minus those received in the runway) and 
chow were combined and given to S 15-20 min. 
after the daily training. 

All Ss were given four trials per day. The run- 
way used on a particular trial was randomly pre- 
determined with the stipulation that half of the 
trials of each day (two) be in each runway. The 
Ss were run in rotations of four, resulting in ap- 
proximately a 4-min. intertrial interval. Each S 
was removed and returned to its home cage upon 
consuming the reward, provided it remained a 
minimum of 15 sec. and no longer than 3 min. 
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Fic. 10. Mean running speed during the last two 
days of Preshift for each group in Experiment III. 


The Ss were equally balanced among groups on 
the basis of their experimental histories: 20 Ss 
were assigned to Group 12-1 and 8 Ss to Group 
1-1. In addition, half of the Ss in each group were 
assigned Black as the runway with the larger re- 
ward (MFA); half were assigned White as the 
MFA. In the ease of Ss in Group 1-1, the desig- 
nation was arbitrary; these Ss received one 45-mg. 
pellet in either runway for the entire experiment. 
The Ss in Group 12-1 received 12 pellets in the 
MPA and 1 pellet in the LFA during Preshift, 
which comprised the first 11 days (44 trials) of 
training. During Postshift (Days 12-17; Trials 45- 
68), all Ss received one pellet reward on all trials 
in either runway. 


Results and. Discussion 


In general, the SimCEs and SucCEs oc- 
curred with this present procedure just as 
they had when the T-maze apparatus was 
used. However, the effect of brightness was 
quite different in this procedure in which 
S was exposed to only one brightness at a 
time. Terminal running speed for the three 
conditions during Preshift may be seen in 
Figure 10. Speed to the MFA on the last 
2 days of the Preshift stage was greater 
the larger the reward, both when the 
Black runway held the larger reward 
(t = 248, p < .05) and when White was 
associated with the larger reward (t — 
5.54, p « .01). The SimCE occurred dur- 
ing these last 2 days of the Preshift stage 
as Group 12-1 had slower mean speed to 
the White LFA than Group 1-1 (t — 2.29, 
p « .05), although the numerical SimCE 
obtained by Ss in Group 12-1, which re- 
ceived their one pellet in the Black run- 
way, did not obtain statistical significance 
at the .05 level. This is believed to be 
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largely a consequence of the fact that this 
group had faster mean running speeds to 
begin with, as evidenced by running on the 
first few trials of Preshift. 

Following the decrease in reward on the 
MFA for the 12-1 groups, behavior was 
consistent with that previously obtained in 
the T maze. In partieular, running speed 
to the MFA decreased appropriately, while 
running speeds to the LFA were not so 
greatly affected. The successive depression 
effect was weak and its reliability not too 
convincing. In terms of total Postshift 
speed to the MFA, the performance of 
Group 12-1 with the Black MFA did 
undershoot that of the baseline control 
(t — 2.10, p « .05), but the performance 
of Group 12-1 with the White MFA did 
not differ from its baseline (t = .80). 

Thus it was apparent that the SucCE 
was no stronger under these conditions 
than in the previous experiments with the 
T maze. Where it was reliably measured 
in Group 12-1 with the Black runway as 
the MFA, the CE was perhaps inflated by 
the spurious increase in running speed 
there by Group 1-1. Finally, it was clear 
that the decrease in behavior during the 
Postshift stage obtained in the White run- 
way was no greater than that obtained in 
the Black runway; indeed, the opposite 
trend occurred. 

The SimCEs and SueCEs in running 
speed, then, occurred in about the same way 
in the dual-runway situation as in the T- 
maze experiments, However, the effect of 
brightness when presented separately did’ 
not produce the same trend occurring in 
the T maze. In any case, not a great deal 
can be made of the relative interacting ef- 
fects in the two situations: the dimension 
existing in the T maze was only approxi- 
mated in the dual-runway situation, and 
the interacting effects in the T maze con- 
tributed only a minor source to the vari- 
ance, anyway. 


ExprrtMent IV 


The final experiment in this series was 
designed to answer certain general ques- 
tions relevant to understanding the effects 
of a shift in magnitude of reward, and 
other questions arising from previous re- 
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sults with SimCEs. The following were 
considered: (a) To what extent is a 
SucCE modified as a consequence of S's 
prior experience with a SimCE and/or the 
Postshift magnitude of reward? (b) Does 
a SimCE exist when a discrimination is 
formed between stimuli associated with 
some reward and stimuli associated with 
no nominal reward—the typical diserimi- 
nation task? (c) To what extent does a 
simultaneous elation effect occur, and is it 
comparable in magnitude to the simul- 
taneous depression effect? (d) What is the 
effect of distribution of trials on the simul- 
taneous and successive depression effect 
when more than one trial per day is given? 
These four basie problems of Experiment 
IV are elaborated below. 

a. The SucCE as a function of rein- 
forcement history. The question was 
whether the extent of a SucCE is affected 
by previous experience with a SimCE 
and/or the magnitude of reward presented 
in the LFA during the Postshift stage. In 
this experiment, as before, Postshift per- 
formance to one pellet on either side of a 
T maze was compared as a function of the 
reward previously obtained in these alter- 
natives during Preshift. Performance on 
the MFA as a function of prior LFA re- 
ward was of prime relevance. Specifically, 
this question was concerned with the rela- 
tive successive depression effects obtained 
in the MFA for Ss which had alternative 
Preshift rewards of 12 pellets and 12 pel- 
lets (Group 12-12), 12 pellets and 0 pellets 
(Group 12-0), and 12 pellets and 1 pellet 
(Group 12-1). From past data, it was cer- 
tain that Group 12-1 would reveal a simul- 
taneous depression effect during the Pre- 
shift stage. Group 12-1 also had Preshift 
experience with the Postshift reward of one 
pellet. Obviously, Group 12-12 would have 
neither of these experiences during Pre- 
shift. It was an empirical question whether 
Ss in Group 12-0 would reflect a SimCE 


. during the Preshift stage; certainly they 


would not have experienced the Postshift 
Teward prior to the shift. Thus, this design 
Permitted a comparison of the extent to 
which the SucCE would occur for Ss 
which previously had experienced both a 
SimCE and the Postshift magnitude of 


reward, relative to Ss who had experienced 
neither of these and to Ss which had ex- 
perienced only the SimCE. 

Most critical concern was given this 
latter group—Group 12-0. If their perform- 
ance reflected a SimCE during the Preshift 
stage, it would have suggested that these 
Ss responded to zero reward as if it were, in 
fact, a small nominal reward. The alterna- 
tive would be that response to zero reward 
is unique, that it reflects essentially zero 
behavior which is unaffected by S's experi- 
ence with other rewards elsewhere, and 
that it therefore should not enter into a 
SimCE, It is clear that if it were found 
that Ss in Group 12-0 were responding to 
their LFA as if small reward were present 
there, their behavior should then have been 
more similar to that of Ss in Group 12-1 
than Group 12-12. On the other hand, if it 
were the case that the reinforcement con- 
ditions in the MFA and LFA operate inde- 
pendently on Ss in Group 12-0—that is, if 
no SimCE exists—then there would have 
been no reason to expect the Postshift per- 
formance in the MFA by this group to be 
any different from that found in Group 
12-12. It may be noted that these condi- 
tional predictions are essentially atheoreti- 
cal, at least to the extent that both a per- 
ceptual and emotional interpretation of 
CEs would appear to make essentially the 
same predictions (see Discussion). 

b. The SimCE on zero reward. The con- 
ventional discrimination task includes a 
choice between stimuli associated with 
some reward and stimuli associated with 
no reward. Does a SimCE exist under 
these circumstances? This question was 
answered in the present experiment by com- 
paring the running speeds to a nonre- 
warded LFA for Ss having 12, 1, or 0 pel- 
lets on the MFA (Groups 12-0, 1-0, and 
0-0, respectively). To the extent that run- 
ning speed to this nonrewarded LFA is in- 
versely related to the magnitude of reward 
on the MFA, a SimCE would have been 
defined in the form of a depression effect. 

c. The simultaneous elation effect. To 
determine whether a simultaneous elation 
effect, also occurred in this situation, run- 
ning speed to the MFA was compared for 
Ss in Groups 12-0, 12-1, and 12-12. To the 
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extent that MFA speeds were inversely 
related to magnitude of reward on the 
LFA, a SimCE would be defined in the 
form of an elation effect. 

d. The effect of distribution of trials on 
the SimCE and SucCE. Although a SinCE 
in the form of a depression effect had been 
readily obtained in Experiments I, II, and 
III of the present report as well as in 
other research (e.g., Spear & Hill, 1965), 
one experiment from our laboratory con- 
spicuously failed to demonstrate it (Spear 
& Pavlik, 1966). Spear and Pavlik em- 
ployed the same procedure as in our other 
experiments (including the identical ap- 
paratus and Zs as in Experiment II of the 
Spear and Hill paper) with the exception 
of the distribution of trials: Spear and 
Pavlik gave only one trial per day, Not 
only did they fail to obtain a simultaneous 
depression effect during Preshift, they 
found running speed to the LFA reliably 
greater for Ss in Group 12-1 compared to 
Group 1-1, and they also found running 
speed to the MFA to be greater for Group 
12-1 than for a group given 12 pellets on 
either alternative (which defined a simul- 
taneous elation effect). They presented 
evidence that this was not due to the fact 
that the experiments employing more than 
one trial a day had resulted in animals 
which were differentially satiated on food 
and that their results were not due to any 
chance error of having unusually fast Ss 
in their Group 12-1. Apparently, the em- 
ployment of only one trial per day caused 
the difference. 

It became clear that an experiment was 
needed in which similar conditions were 
employed but in which intertrial interval 
was varied. If intertrial interval were the 
critical factor, this fact should show up 
when more than one trial is given per day 
(assuming the differential in interval is 
sufficient to produce differential behavior) . 

Thus, several conditions were included in 
the present experiment in which treat- 
ment differed in terms of the magnitude of 
reward which appeared in the alternatives 
of the T maze. Orthogonal to the magni- 
tude of reward variable, intertrial interval 
was varied: in each condition, one subgroup 


of Ss received a 15-sec. intertrial interval 
and the other subgroup received a 15-min. 
intertrial interval between each of their six 
trials given in a single day. To improve 
comparability, all conditions were run an 
equal number of times in each of several 
replications of the experiment. However, 
basic concern was with several relatively 
disjoint questions asked within this experi- 
ment rather than with the complete fac- 
torial design as it eventually developed. 


Method 


Subjects and apparatus. The Ss were 96 experi- 
mentally naive female albino rats of the Sprague- 
Dawley strain, approximately 60 days old at the 
start of training and weighing 180-200 gm. The 
T maze, housing of Ss, and running procedure 
(moving of cage rack outside the testing cubicle) 
were the same as in Experiment I. The inserts 
were also as used in Experiment I, with White 
always on the left and Brown always on the right. 

Procedure. The prehandling and deprivation 
schedule was essentially the same as in Experi- 
ment I. The Ss received ad lib access to water and 
were given a total of 10.4 gm. of food daily con- 
sisting of pellets and chow. 

Experiment IV was run in four replications of 
24 Ss each, Each replication contained two Ss from 
each of 12 groups, one of which was assigned 
Brown (right) as the MFA; the other, White 
(left) as the MFA. The 12 groups comprised a 
6 X 2 factorial design varying Preshift magnitude 
of reward and intertrial interval. Half of the Ss 
received their six daily trials in relatively rapid 
succession, being returned to their home cages for 
about 15 sec. between trials; the remainder were 
run in rotation with a resulting 12-15-min. inter- 
trial interval. An individual S experienced the 
same intertrial interval throughout the entire ex- 
periment. The 12 groups were: 0-0-M, 0-0-S, 1-0-M, 
1-0-S, 1-1-M, 1-1-S, 12-0-M, 12-0-S, 12-1-M, 12-1-8, 
12-12-M, and 12-12-S. The first number indicates 
Preshift magnitude of reward (number of pellets) 
on the MFA, the second number indicates Pre- 
shift magnitude of reward on the LFA, M is 
massed trials (15-sec. intertrial interval), and S is 
spaced trials (12-15-min intertrial interval). Dur- 
ing Postshift, all Ss received one pellet on either 
alternative. 

The running procedure was the same as in Ex- 
periment I, except that Ss now received only one 
free trial per day. For Ss in replications one and 
three, the first trial of each day was free, Trial 2 
was forced to the side opposite that chosen on 
Trial 1, and Trials 3-6 were forced on a random 
basis, half to each alternative for each S each day. 
For Ss on replications two and four, Trial 5 of 
each day was free, Trial 6 forced to the side op- 
posite that chosen on Trial 5, and Trials 1-4 were 
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forced on a random basis, half to each alternative. 
Preshift, as in Experiment I, comprised Trials 
1-48, (Days 1-8). Postshift was given through 
Trials 49-144 (Days 9-24). 

Response measures. Only the committed speed 
and choices are reported for this experiment. 
Turning speed replicated previous results: less 
sensitivity than committed speed to the successive 
shift in reward, and a high correlation with choice. 
Turning speed may be a somewhat misleading 
measure anyway, particularly in a position-dis- 
crimination task. First, it exaggerates the differ- 
ence between MFA and LFA speed because the 
initial turning motion of S toward the MFA may 
take place before the forcing door is seen, More- 
over, the SimCE obtained with this measure may 
be inflated by the initial turn toward the MFA; 
by comparison with committed speed, it is clear 
that response strength to the MFA is exaggerated 
in this way when the turning-speed index is used. 
Although in terms of competing responses turning 
speed is as legitimate an index as any, even though 
this artifact does occur, we prefer to view the 
SimCE as primarily due to slower approach to 
the LFA rather than a consequence of greater in- 
centive from such a specific competing alternative. 
Committed speed provides a somewhat purer 
measure in this respect. Finally, turning speed 
may provide a spuriously slow index of response 
strength in groups with equal magnitudes of al- 
ternative reward because this measure includes 
the time taken to VTE and otherwise resolve 
conflict at the choice point. 


Results 


Preshift choice behavior. As expected, 
the average preference for their MFA by 
those Ss choosing between equal reward 
magnitudes (Groups 0-0, 1-1, and 12-12) 
was about equal throughout to that for 
their LFA. Therefore, analyses of choice 
behavior presented below concern only 
those Ss receiving differential rewards in 
the alternatives (Groups 1-0, 12-0, and 
12-1). A chi-square test on total number 
of choices of the MFA during Preshift re- 
vealed that reward magnitude reliably 
affected choice behavior. Groups 12-0 and 
12-1, which did not differ, had more choices 
of the MFA during Preshift than did 
Group 1-0, 32(2) = 13.71, p < .001. The 
overall mean probability of choosing the 
MFA was .87 for Group 12-0, .86 for Group 
12-1, and .74 for Group 1-0. No reliable 
effect of trial spacing was obtained; the 
overall mean probability of choosing the 
MFA was .83 for Ss given massed trials 
and 81 for Ss given spaced trials. There 


was no interaction between trial spacing 
and reward magnitude. 

It is perhaps worth mentioning that the 
relative and absolute preference for the 
MFA among the various experimental 
conditions was the same whether choice 
behavior was measured on the first trial or 
the fifth trial of each day. This has been 
the typical finding in experiments employ- 
ing the present paradigm and is relevant to 
questions concerning both the effect of 
distribution of trials and the potential food 
satiation when more than one trial is given 
per day. The first trial of a day follows 
by more than 22 hours the immediately 
preceding trial, while the intertrial interval 
preceding the fifth daily trial is necessarily 
much less than 22 hours. 

Change in behavior during early Pre- 
shift trials. The early growth of the dis- 
crimination may be assessed in terms of 
the progressive change in the differential of 
behavior directed toward the MFA com- 
pared with that directed toward the LFA. 
The specific analysis chosen in this case 
(a three-way mixed analysis of variance) 
compared the mean difference in MFA and 
LFA committed speeds within Groups 
12-0, 12-1, and 1-0 on Preshift Days 1 and 
2 with that on Preshift Days 3 and 4. 

It was found that the extent to which 
the MFA speed was greater than the LFA 
speed was directly related to the difference 
in the alternative reward magnitudes. That 
is, from the greatest difference between 
MFA and LFA speeds to the least differ- 
ence, the groups were ordered 12-0, 12-1, 
1-0, F(2,42) = 6.66, p < .005. There was 
no reliable main effect of distribution of 
trials, nor did this variable enter into any 
reliable interaction. The difference be- 
tween LFA and MFA speed was, of course, 
greater on Days 3 and 4 than on Days 1 
and 2, F(1,42) = 10.03, p < 005, and 
there was a reliable interaction of reward 
magnitude with days, F(2,42) = 10.12, 
p < .001. This latter result reflected the 
increasingly more rapid discrimination 
formed in those groups with the greater 
differential in their alternative magnitudes 
of reward. 

The same differences were obtained in 


20 Norman E. SPEAR AND Josep H. SPrTZNER 


MASSED-MFA 


SPACED- MFA 


COMMITTED SPEED (FT/SEC) 


Fic. 11. These comparisons in terms of LFA 
speed represent a test for the simultaneous depres- 
sion effect. Mean committed speed to the LFA and 
MFA is shown during the last two Preshift days 
(for massed and spaced Groups 12-1, 12-0, 1-1, and 
1-0) in Experiment IV. 


terms of mean committed speed on the 
final two Preshift days (see Figure 11). 
The difference between MFA and LFA 
speeds remained greater the greater the dif- 
ferential in alternative rewards, F(2,42) = 
5.42, p < .01. Also at this point there was 
no effect of distribution of trials, and trial 
distribution did not interact with reward 
magnitude (F < 1 in each case). 

Simultaneous elation effects at the end of 
the Preshift stage. Considering first the 
mean committed speed to the MFA, an 
analysis of variance including Groups 12-1, 
12-0, 1-1, and 1-0 was performed with 
MFA reward magnitude, LFA reward mag- 
nitude, and distribution of trials as fixed 
orthogonal variables. The relevant com- 
parisons may be seen in Figure 11 (MFA 
speed). 

An elation effect would have been de- 
fined if MFA speed were greater the less 
the reward magnitude on the LFA (Groups 
12-0 and 1-0 compared to 12-1 and 1-1). It 
was found that neither reward magnitude 
on the LFA, nor distribution of trials, nor 
any interactions produced reliable vari- 
ance (F values ranged from .42 to 1.22 for 
all nonsignificant effects) ; thus, no elation 


effect was found in terms of this analysis. 
The only reliable source of variance in this 
case was greater speed to the MFA the 
greater the reward magnitude on the MFA, 
F(1,56) = 5.27, p < .05. 

Committed speeds on the MFA were 
also examined by another analysis for evi- 
dence of an elation effect. Recall that a 
simultaneous elation effect would be de- 
fined if MFA speeds to a common MFA 
reward magnitude were inversely related 
to the magnitude of reward on the LFA, 
Thus, the MFA speeds for Ss in Groups 
12-0, 12-1, and 12-12 (see Figure 12) were 
compared under the two levels of trial 
distribution by an analysis of variance, 
Neither the effects of reward magnitude 
nor trial spacing approached significance 
at the .05 level—no elation effect occurred. 
The F value for the nonsignificant inter- 
action, F (1,42) = 2.61, p > .05, was clearly 
inflated by the food-satiation effect which 
occurred in Group 12-12 when massed 
trials were given. Thus, it is concluded 
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Fic. 12. This comparison represents a test for 
the simultaneous elation effect. Preshift committed 
speed to the MFA is shown for massed and spaced 
Groups 12-0, 12-1, and 12-12 in Experiment IV. 
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that no evidence for a simultaneous elation 
effect was obtained under these circum- 
stances. 

Simultaneous depression effects at the 
end of the Preshift stage. A basic aim of 
this experiment was to determine whether 
SimCEs occur in the conventional dis- 
crimination task involving a choice be- 
tween some reward and no reward. With 
the present design, it was possible to go 
one step further and to examine the extent 
to which such a SimCE might be com- 
parable in magnitude to the SimCE ob- 
tained when the choice is between a large 
and a small reward. The appropriate anal- 
ysis included a comparison of LFA com- 
mitted speeds by Groups 12-1, 12-0, 1-1, 
and 1-0 on the last 2 days of the Preshift 
stage (see Figure 11). Thus, the three 
orthogonal variables in this analysis were 
reward magnitude on the MFA, reward 
magnitude on the LFA, and distribution of 
trials, 

Overall, the simultaneous depression 
effect did occur as speed to the LFA was 
less the greater the reward magnitude on 
the MFA, F(1,56) = 15.51, p < .001. Per- 
haps more important is the fact that this 
simultaneous depression effect did not 
differ whether the choice was between 
Something and nothing or whether the 
choice was between something large and 
something small, as reflected by the ab- 
sence of interaction between MFA and 
LFA reward magnitude, F(1,56) = .06. 
Speed to the LFA was greater when one 
pellet was the reward in the LFA than 
when no pellets were present, but neither 
the effect of distribution of trials nor any 
Temaining interaction approached statisti- 
cal reliability (Fs ranged from .06 to 1.88). 
These results imply that the SimCE is 
present when S chooses between some 
hominal reward and no reward. Further- 
more, it appears that the magnitude of this 
SimCE does not deviate from that obtained 
ma choice between a large and a small 
reward. Finally, the absence of interaction 
between MFA reward and distribution of 
trials shows that our prediction in this 
respect was incorrect: the SimCE does not 
decrease with longer intertrial interval. 

Another way of investigating the SimCE 


when the discrimination is between stimuli 
associated with some reward versus no re- 
ward is by analyzing the committed speeds 
on the last 2 days of the Preshift stage of 
Groups 12-0, 1-0, and 0-0 orthogonal to the 
distribution-of-trials variable. This analy- 
sis revealed a slight complication. The 
reliable effect of reward magnitude, 
F(242) = 3.89, p < .05, largely reflected 
the uniformly greater LFA speed by Ss in 
Group 1-0 compared with Ss in Group 
12-0. However, the mean LFA speeds in 
Group 0-0 were only slightly greater than 
those in Group 12-0, and this was the case 
only under conditions of spaced trials. This 
effect was not great enough to result in a 
reliable interaction (Distribution of Trials 
X Reward Magnitude) (F < 1), though it 
did contribute to the significantly greater 
LFA speeds overall for Ss run under spaced 
trial conditions, F (1,42) = 5.08, p < .05. 
It is not unlikely that a condition such as 
that of Group 0-0, in which no portion of 
behavior is under the control of food re- 
ward, may not provide an adequate base- 
line for the estimation of CEs in Ss whose 
behavior is under the control of food re- 
ward to at least some extent. 

Choice behavior during the Postshift 
stage. Since all Ss in Groups 12-0, 12-1, 
and 1-0 chose the MFA on the last free 
trial of the Preshift stage and also chose 
this alternative during each free trial on 
the first 2 days of Postshift, the change 
in choice behavior was analyzed in terms 
of the trial on which the formerly LFA 
was first chosen following the shift in re- 
ward magnitude in the MFA. Preshift re- 
ward magnitude made little difference in 
this respect. The mean Postshift trials of 
the first LFA choice were 8.19, 9.19, and 
8.25, respectively, for Groups 12-0, 12-1, 
and 1-0; and the respective mean prob- 
abilities of choosing the former MFA 
throughout Preshift were .79, .77, and .76. 

However, those Ss given spaced trials 
persevered in choosing the formerly MFA 
to a considerably greater extent than those 
given massed trials (see Figure 13). The 
reliability of this was substantiated by a 
Mann-Whitney U test in terms of the trial 
on which the first LFA choice was made 
during Postshift, U = 30.25, p < .001. 
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Fic. 13. The course of choice behavior following 
the shift in reward as a function of distribution of 
trials. Scores for Groups 12-0, 12-1, and 1-0 are 
combined. 


Combining the differentially rewarded 
groups, it was found that the mean free 
Postshift trial on which the first LFA 
choice was made was 5.92 for Ss given 
massed trials and 11.08 for Ss given spaced 
trials. There was no interaction between 
Preshift reward and distribution of trials. 

It is notable that the basic relationship 
between distribution of trials and per- 
sistance in choosing the former MFA after 
the reward shift did not differ whether 
only the first trial (which was preceded 
by a 24-hr. intertrial interval in both the 
massed and spaced conditions) or the fifth 
trial of each day was considered. This sug- 
gests that the greater perseverance associ- 
ated with more widely spaced trials cannot 
be dismissed as a simple consequence of a 
greater tendency under massed trials to 
alternate stimuli between the fourth and 
fifth daily Postshift trials. Also, this fact is 
not likely a consequence of the greater 
number of errors by the spaced-trial groups 
during Preshift (cf. D’Amato & Jagoda, 
1960) since number of “errors” was, in 
fact, equated for all Ss by employing 
forced trials. 

Committed speed to the MFA following 
the shift in reward magnitude: Test for 
SucCE. There are several ways to evaluate 
the SueCE and several questions concern- 
ing its occurrence. First consider com- 
mitted speed to the MFA during the first 4 
days of Postshift. With this measure, an 
analysis was completed employing the 
basic factorial design, which included 
Groups 1-0, 1-1, 12-0, and 12-1. These 
speeds are shown in Figure 14, combining 
massed- and spaced-trial groups. Recall 


that three orthogonal variables are MFA 
reward magnitude, LFA reward magnitude, 
and distribution of trials. The only reli- 
able source of variance was found to be 
contributed by the SucCEs: committed 
speed to the MFA was slower for those Ss 
previously having the larger MFA reward 
magnitude, F(1,56) = 10.41, p < .001. 
There was some tendency toward greater 
MFA speed for those Ss (Groups 12-1 and 
1-1) which previously had the larger LFA 
reward, but this effect did not attain sta- 
tistical reliability, F (1,56) = 2.94, p > .05. 
None of the other five sources of variance 
approached statistical significance (F val- 
ues ranged from .01 to 1.79). Thus the 
SucCE (depression) was defined and pro- 
vided the only reliable source of variance 
in this analysis. 

Test for SucCE (depression) on the 
MFA as a function of reinforcement his- 
tory on the LFA. The SucCE may be 
evaluated by analyzing the effects of Pre- 
shift reward magnitude and distribution of 
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Fic. 14. These three comparisons represent tests 
of the SucCE. Postshift committed speeds are 
shown to the formerly MFA, summed over massed- 
and spaced-trial conditions in Experiment IV. 
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trials as a function of Postshift experience. 
When committed speeds to the alternative 
shifted from 12 pellets to 1 pellet (Groups 
12-0, 12-12, 12-1) are thus compared to 
corresponding speeds for the baseline con- 
trol Group 1-1, slower overall speed by a 
group which formerly received 12 pellets 
would define a SueCE. A SueCE would 
also be present if the interaction between 
Preshift reward magnitude and Postshift 
days contributed significant variance, as- 
suming the interaction were such that the 
group which previously had the larger re- 
ward increased their initially depressed 
speed relative to the lesser change by 
Group 1-1. The general Postshift per- 
formance of these four groups may be seen 
in Figure 15. 

The MFA committed speeds of Groups 
12-0 and 1-1 were compared over eight 
blocks of 2 Postshift days. A SucCE was 
found summing over blocks of Postshift 
days—speed to the formerly MFA was less 
for Group 12-0, F(1,28) = 6.26, p < .025. 
This SucCE was also reflected by the sig- 
nificant interaction of reward magnitude 
by blocks, F (7,196) = 5.84, p < .001. 
Figures 14 and 15 show that this interac- 
tion was a consequence of the increase in 
speed over blocks by Group 12-0 follow- 
ing the earlier SueCE. The only remaining 
significant effect from this analysis (the 


*It is necessary to comment on the running 
speed of Group 1-1 which was given spaced trials. 
The mean speed by this group was consistently 
greater than that by any other group throughout 
the Postshift stage. It is our belief that this isa 
spurious result—a matter of sampling error. There 
are several bases for this contention. First, in an 
absolute sense, the Postshift speed by this group 
was greater than that previously obtained under 
these conditions with the same apparatus and un- 
der conditions of nearly comparable intertrial in- 
terval. Second, as can be seen to some extent in 
the LFA speeds shown in Figure 16, this group 
showed considerable increase from the end of Pre- 
shift throughout most of Postshift; this has not 
been found previously under these conditions. Fi- 
nally, in several previous experiments of this kind, 
we have always found that the mean speed to the 
MFA by groups shifted in reward magnitude ad- 
justs to a common level—that of the baseline con- 
trol—by the end of Postshift. This was not the case 
in this experiment, but only because of the faster 
running by Group 1-1 given spaced trials. Thus, we 
conclude that the Postshift speed of this group is 
inflated and oyerestimated due to chance factors. 
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Fic. 15. Committed speed to the MFA during 
initial and terminal portions of Postshift: magni- 
tude of the SucCE on the MFA as a function of 
prior reward on the LFA. 


triple interaction, F(7,196) = 3.80, p < 
001) is attributed entirely to the inflated 
mean speed in Group 1-1 with spaced 
trials? 

Independent analyses of variance com- 
paring the MFA speeds of Groups 12-0 
and 1-1 on each block of 2 Postshift days 
determined that the effect of Preshift re- 
ward magnitude no longer contributed re- 
liable variance after the seventh and 
eighth days of Postshift. Thus, the com- 
mitted speed to the MFA of Ss in Group 
12-0 apparently adjusted to the level of 
their baseline control by about the ninth 
and tenth days of Postshift. 

It appeared that the SucCE described 
above for Ss in Group 12-0 was a more 
powerful effect than had been obtained 
previously under conditions comparable to 
those of Group 12-1. In fact, the present 
Group 12-1 did not show a statistically re- 
liable SucCE either in terms of slower 
mean committed speed (MFA) compared 
to Group 1-1, F(1,28) = 2.57, p > .10, or 
in terms of the significant interaction be- 
tween the effects of reward magnitude and 
blocks of Postshift days, F (7,196) = 56. 
However, it should be noted that the 
tendency for a SucCE in Group 12-1 was 
present numerically though not statisti- 
cally reliable at the .05 level. Since this 
tendency has been found under these con- 
ditions in a total now of six experiments— 
sometimes attaining statistical reliability 
(at the .05 level) and sometimes not—the 
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danger of a Type I error is probably not 
too great to permit the conclusion that a 
SucCE was present in Group 12-1. The 
test of this tendency for this effect to be 
weaker in Group 12-1 than in Group 12-0 is 
presented below. 

The SucCE could also be defined in 
'terms of the Postshift performance of Ss in 
Group 12-12. This group had reliably 
slower speeds than were found in Group 
1-1, F(128) = 891, p < .01; thus, the 
successive depression effect also occurred in 
Group 12-12. The fact that this effect oc- 
curred most strongly during the early 
stages of Postshift is reflected in the reli- 
able interaction of Preshift reward mag- 
nitude and Postshift blocks, F (4,112) = 
2.54, p « .05. Independent analyses of vari- 
ance over each block of 2 Postshift days 
suggested that the speeds of Group 12-12 
had adjusted to the level of Group 1-1 by 
the fifth and sixth days of Postshift, when 
the effect of Preshift reward magnitude 
was no longer reliable (p > .05). The 
effect of distribution of trials, though in- 
flated by the chance oceurrence of Group 
1-1 faster speeds under spaced conditions, 
did not attain statistical reliability in this 
analysis, F (1,28) = 3.00, p > .05, and all 
other sources of variance yielded Fs less 
than 1. 

'The results of the above analyses sug- 
gest that the suecessive depression effects 
obtained were stronger in Groups 12-0 and 
12-12 than in Group 12-1. These groups 
differed only in terms of LFA reward ex- 
perienced during Preshift. However, the 
prior reinforcement conditions were such 
that Group 12-12 never had experience with 
a SimCE nor the specific magnitude of re- 
‘ward (one pellet) presented during Post- 
‘shift, while Group 12-0 had experienced the 
former but not the latter. In order to deter- 
‘mine whether only one or both of these 
factors had caused the difference from 
Group 12-1 in terms of adjustment to the 
Postshift reward, committed speed to the 
MFA was compared for these groups dur- 
ing the Postshift stage (employing the Re- 
ward Magnitude X Distribution of Trials 
x Blocks of 2 Postshift Days analysis of 
variance as above). 

‘The results supported the conclusion 


that both Groups 12-12 and 12-0 were 
more affected by the shift from 12 pellets to 
1 pellet on the MFA than was Group 12-1 
and in the direction of a greater SucCE: 
The Postshift committed speeds of Groups 
12-12 and 12-0, however, did not differ. 
The results of the analysis of variance 
comparing Group 12-1 with 12-12 and 
that comparing Group 12-1 with 12-0 were 
essentially the same. In both cases the 
only reliable source of variance, with the 
exception of the obvious effect of Post- 
shift days, was the interaction between 
reward magnitude and blocks of Postshift 
days—for 12-1 versus 12-12, F(7,196) = 
281, p < 01; for 12-1 versus 12-0, 
F(7,196) = 3.18, p < .005. This interac- 
tion reflected the fact of eventual adjust- 
ment to the same level of MFA com- 
mitted speeds (at about Postshift Days 
6-10) for all groups, but with both of 
Groups 12-12 and 12-0 adjusting from a 
lower level early in Postshift than was the 
case in Group 12-1. No other effects from 
these analyses approached statistical reli- 
ability—the F values ranged from .01 to 
138. When the same analysis compared 
Groups 12-0 and 12-12, neither the effect 
of reward magnitude nor the interaction 
between reward magnitude and blocks of 
Postshift trials attained statistical reli- 
ability (F « 1 in each case). It is con- 
cluded that the shift in MFA reward did 
not differentially affect Ss in Groups 12-0 
and 12-12, but in both cases the effect was 
greater than that found in Group 12-1. 

Test for SucCE  (elation) in group 
shifted from zero nominal reward to small 
reward. To the extent that the mean speed 
in Group 0-0 overshot that in Group 1-1, a 
successive elation effect would have been 
defined. However, the test—a three-way 
analysis of variance. comparing ` com- 
mitted speeds of Group 1-1 with those of 
Group 0-0 under conditions of massed or 
spaced trials and over the first five blocks 
of 2 Postshift days—revealed that no suc- 
cessive elation effect had occurred. In fact, 
Ss having the 1-1 reward condition main- 
tained faster running speed than those in 
the 0-0 condition—when scores were com- 
bined across blocks of Postshift'. days, 
F (1,28) = 17.42, p < .001. Thus, it is con- 
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eluded that no successive elation effect 
occurred. 

Running speed to the LFA during the 
Postshift stage. Subsequent to a simul- 
taneous depression effect, a reduction in the 
magnitude of the contrastingly large MFA 
reward results in the adjustment of the 
previously depressed performance on the 
LFA: these LFA speeds increase toward 
the level of expected performance, as de- 
fined by the baseline control group (see, 
for example, Experiment I and II of the 
present report; Spear and Hill, 1965). This 
adjustment was investigated here as a 
function of reward magnitude on the MFA 
during Preshift, reward magnitude on the 
LFA during Preshift, and distribution of 
trials. Of course, only the preformance of 
the differentially rewarded groups is rele- 
vant here, in relation to the baseline control 
Group 1-1 and in relation to each other. 
These speeds (see Figure 16) were analyzed 
in several ways, but only a few results war- 
rant consideration here. 

First, the direction of two interactions— 
between MFA reward magnitude and 
blocks of Postshift days, F (1,56) = 11.32, 
p < .001, and between LFA reward mag- 
nitude and blocks of Postshift days, 
F(1,56) = 6.30, p < .025—revealed two 
clear and related facts. It showed that the 
increase in LFA committed speeds during 
Postshift was greater the larger the Pre- 
shift reward on the MFA and the smaller 
the Preshift reward on the LFA. Thus, 
these relationships reflect the combined 
effects of absolute and relative reward 
magnitude experienced on the LFA prior 
to the shift. Second, Ss which previously 
had no reward on the LFA showed a greater 
increase in LFA speeds the larger the 
previous MFA reward (for the interaction 
between the MFA: reward, LFA reward, 
and blocks of Postshift trials, 7(1,56) = 
6.34, p < .025). This appeared to be a 
consequence of the initially slower LFA 
speeds by Group 12-0 relative to Group 
12-1 in comparison to the smaller differ- 
ence between Groups 1-0 and 1-1. It is 
clear that this effect of MFA reward is 
one reflection of the SimCE. 

Finally, since a powerful simultaneous 
depression effect had been found in Group 
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for massed and spaced Groups 12-1, 12-0, 1-1, and 
1-0 in Experiment IV. 


12-0 relative to Group 1-0 during Pre- 
shift, we may ask how this effect was 
altered during the initial stages of Post- 
shift after reward had been decreased in 
the MEA and increased in the LFA for 
Group 12-0, but only increased in the LFA 
for Group 1-0. A three-way analysis of 
variance was completed comparing the 
effects of Preshift reward magnitude and 
distribution of trials on the last blocks of 
2 Preshift days and the first block of 2 
Postshift days. Although the overall simul- 
taneous depression effect was, of course, 
reliable, F (1,28) = 1928, p « .001, no 
other effects approached statistical sig- 
nificance, This suggests that the effects of 
the reward shift on the first 2 days of 
Postshift were not great enough in terms 
of LFA committed speed for these groups 
to alter the effects that had obtained during 
the terminal stages of Preshift. 


Discussion 
The most important results in this ex- 
periment concern differences in committed 
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speed; there was again no evidence for a 
SucCE in choices. This was true even for 
Group 12-0, which was shifted both to 
larger LFA and smaller MFA rewards. As 
usual, greater preference for the MFA was 
obtained the greater the differential in the 
alternative reward magnitudes, and this 
preference was never reversed when the 
alternative rewards became equal. 

Perhaps the one interesting result in 
choice behavior was the greater persistence 
in choice of the formerly LFA following 
the reward shift for those Ss run under 
conditions of spaced trials. A similar re- 
sult has been reported by Clayton (1964) 
for rats in a discrimination-reversal task 
(although the analogy with the present 
paradigm is not perfect, the same processes 
may be involved). Clayton found that Ss 
run under a 20-min. intertrial interval 
were more reluctant to discontinue their 
choice of the original MFA than were Ss 
run under an intertrial interval of 10 sec. 
or 3 min. He interpreted this in terms of 
greater forgetting during the longer inter- 
trial interval. Both his data and interpreta- 
tion are compatible with similar runway 
phenomenon (Hill, Erlebacher, & Spear, 
1965; Spear, 1965; Spear, Hill, & O'Sulli- 
van, 1965) and also would fit the present 
results rather well. It is surprising that 
committed speed was not similarly affected 
by the differential intertrial interval. There 
is, however, evidence that running speed is 
less susceptible to the effects of a retention 
interval than is behavior (such as choice) 
that reflects the efficiency of the discrimina- 
tion (Spear, Hill, & Cotton, 1962). 

Simultaneous contrast effects. Although 
the simultaneous depression effect occurred 
as usual, no evidence could be found for a 
simultaneous elation effect. This is not in- 
consistent with the results obtained in the 
straight runway with the SueCE paradigm. 
An elation effect, defined in terms of reli- 
ably greater speed in a group shifted from 
small to large reward compared with a 
baseline group whieh has always had the 
large reward, has rarely, if ever, appeared 
in the literature. For example, neither the 
classic experiments of Crespi (1942) nor 
Zeaman (1949) obtained results which 
satisfied this definition of an elation effect. 


As Crespi himself pointed out, his Ss which 
showed the “elation effect" had originally 
received large reward prior to being re- 
warded with small, then shifted again to 
large, reward; and Zeaman lacked defini- 
tive control groups. Perhaps the absence of 
an elation effect is a consequence of physio- 
logical limits imposed upon running speed 
in rats. If not, this fact remains a distinct 
problem for perceptual interpretations of 
CEs (e.g., Bevan, 1963). 

Another finding established the occur- 
rence of a SimCE (a “depression” effect) in 
the standard discrimination paradigm in 
which S chooses between some and no re- 
ward. This conclusion was based on the 
finding of slower running speed to zero re- 
ward for Ss in Group 12-0 compared with 
those in Group 1-0; it was not so un- 
equivocal when the behavior of Group 0-0 
was considered as a baseline control. How- 
ever, there are several reasons why the 
behavior of Ss in Group 0-0 (who never 
experienced nominal reward during the 
Preshift stage) may not be considered an 
appropriate baseline. First, no portion of 
the behavior of these Ss was under the 
control of nominal reward during Pre- 
shift; this was not the case for Ss in Groups 
12-0 and 1-0. Second (and this may be a 
reflection of the first point), there was a 
distinct tendency for faster running under 
spaced-trial conditions by Ss in Group 
0-0, and this was the case in only one 
group (12-12) which received reward 
during Preshift. Furthermore, this be- 
havior of Ss in Group 12-12 could be at- 
tributed to a factor obviously not present 
in Group 0-0—greater food satiation un- 
der conditions of massed compared with 
spaced trials. Finally, there were observa- 
tions by E that the behavior of Ss which 
had never received food reward in the ex- 
perimental situation was qualitatively dis- 
tinet. This behavior may be described as 
hyperactive, hyperreactive, skittish, and 
highly variable. In the course of running 
some 8 or 10 experiments in our laboratory 
which have included similar conditions, 
we have invariably obtained the same (un- 
solicited) report from many different Es. 

The third important result was the fact 
that distribution of trials per se had no 
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reliable effect on the SimCE. Of course, 
this conclusion is limited to the range of 
intertrial interval employed here. How- 
ever, this range often has been shown to 
produce substantial differences in other be- 
haviors (e.g., Clayton, 1964; Spear, 1965). 

The lack of an effect of intertrial inter- 
val is important because of the results ob- 
tained by Spear and Pavlik (1966) with 
this paradigm under conditions of one trial 
per day. Spear and Pavlik did not obtain 
a simultaneous depression effect. The criti- 
cal feature appeared to be the fact that 
only one trial per day was given and the 
intertrial interval (24 hr.) was, therefore, 
much longer than that employed when the 
SimCE (depression) was obtained. The 
present experiment has shown, however, 
that similar differential effects probably 
cannot be obtained simply by varying the 
intertrial interval within a daily session of 
trials. It is still possible, of course, that a 
greater range of intertrial interval (for 
example, 15 sec, versus several hours) 
may produce the implied interaction. In- 
deed, in terms of numerical differences, 
the SimCE (12-1 versus 1-1) was slightly 
greater with 15-sec. than 15-min. intertrial 
interval (see Figure 12); and 15 min. is 
considerably less than 24 hr. However, the 
complete absence of statistical reliability in 
this case—and the opposite numerical re- 
sult when Groups 12-0 and 1-0 defined the 
SimCE—make this possibility seem less 
likely. 

Another possibility is that the specific 
processes responsible for the effects ob- 
tained in the Spear-Pavlik experiment 
were somehow correlated with some aspect 
of the operation of presenting one trial 
per day. However, the specific nature of 
these processes if they exist, is not at all 
clear. 

Successive contrast effects. Following the 
shift from 12 pellets to 1 pellet in the 
MFA, the effect on behavior early in Post- 
shift was determined by the nature of the 
alternative rewards experienced prior to 
the shift. Those Ss which had not experi- 
enced the Postshift magnitude of reward 
before the shift slowed their running speed 
to a greater extent after the shift. Specifi- 
cally, the results indicated that the SucCE 


for Ss in Group 12-0 was about equal to 
that in Group 12-12, but greater in both of 
these groups than in Group 12-1. Now it is 
not difficult to understand that the effect 
of the shift should be greater in Group 
12-12 than in Group 12-1. In fact, both a 
perceptual and an emotional interpretation 
of CEs apparently can accommodate this 
fact with ease, in contrast to the SucCE in 
Group 12-0. 

In terms of the adaptation level theory 
proposed by Bevan (1963), it is quite clear 
that the indifference point (defined as that 
reward magnitude judged neither large 
nor small but medium or neutral) would be 
lower for Ss pooling the occurrence of 12 
pellets and 1 pellet over trials (Group 12-1) 
than for Ss receiving no reward magni- 
tude in the experimental situation other 
than 12 pellets (Group 12-12). From this 
viewpoint, the Postshift reward magnitude 
of one pellet, therefore, would appear 
smaller to Ss in Group 12-12 than to Ss in 
Group 12-1 when judged against their re- 
spective indifference points. This per- 
ceptual interpretation would have predicted 
the greater effect of the reward shift in 
Group 12-12 in view of Bevan’s (1963) 
assertion that, “Overall, however, it would 
appear that speed of running, like maze 
performance, varies with the apparent, in 
contrast to the physical, strength of the 
reinforcing agent. [p. 27].” 

Although the interpretation of CEs by 
emotional terms has been perhaps less ex- 
plicit, Bower (1961) has provided a clever 
application of frustration theory. His 
interpretation viewed the SimCE as re- 
sulting *...from a conflict between antici- 
pation of reward (rj) and the anticipation 
of frustration in the S— (LFA) goal box 
[p. 199]." The frustration elicited in the 
LFA goal box was assumed to occur, at 
least in the intermediate stages of dis- 
crimination training, “...when the larger 
amplitude ry established in S+ occurs in S— 
through stimulus generalization [p. 199]." 
It is not inconceivable that this interpreta- 
tion might also be applied to the SimCE 
found after diserimination had been per- 
fect for some time (as measured by the 
independent choice measure). The resolu- 
tion of this conflict being rewarded in 
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Preshift (Ss in Group 12-1 nearly always 
completed their run to the one pellet avail- 
able in the LFA goal box) could account 
for the lesser effect of the shift to one 
pellet in the MFA for this group. By anal- 
ogy with the interpretation of the partial 
reinforcement effect in extinction (Amsel, 
1962) the Ss in Group 12-1 had their 
ry — s; conditioned to running via their 
eventual approach to the (frustrating) 
one pellet during the Preshift stage and 
thus would be expected to respond in the 
same way to s;, that is, to run corre- 
spondingly, when ry was elicited in the 
MFA during Postshift. Since the Preshift 
conditioning of r; — s; obviously was not 
present in Group 12-12, this group would 
be expected to be more disrupted by the 
“frustrating” occurrence of one pellet after 
the shift. 

Apparently, though, both theories have 
difficulty explaining Postshift behavior 
on the MFA by Ss in Group 12-0. In every 
other respect, their behavior was very 
much like that of Group 12-1. This in- 
eluded the oceurrence of the SimCE in 
Group 12-0 relative to Group 1-0, a fact 
which implies that when these Ss were re- 
warded on the MFA they responded to the 
LFA as if the zero reward were a point on 
the continuum of reward magnitude. That 
js to say that Ss in Group 12-0 did not 
respond to the zero reward with the same 
absolute running speed as Ss in Group 1-0. 
Rather, they responded to zero reward as if 
it were a smaller reward than that obtained 
on the MFA. Also as in Group 12-1, the 
Group 12-0 Ss nearly always eventually 
completed their run to the nonrewarded 
LFA goal box, even after the discrimina- 
tion between the MFA and LFA had been 
perfect for some time. Thus, from either 
Bevan's concept of reinforcement pooling 
or an interpretation in terms of condi- 
tioned ry — s;, it would appear that Ss 
in Group 12-0 would be expected to per- 
form more like Ss in Group 12-1 than like 
Ss in Group 12-12. 

Obviously it cannot be certain that the 
conditions of Group 12-0 produce the same 
Postshift behavior in the MFA as would 
be found under conditions like Group 12- 
12; probably it does not. The difficult fact 


for theory, however, is that the behavior 
of Ss in Group 12-0 differed in the same 
direction and to about the same extent as 
the behavior of Ss in Group 12-12. Bevan 
(1963) could account for this only by as- 
serting that zero reward is qualitatively 
different from a very small reward. This 
may be true in certain instances (for ex- 
ample, when none of S’s behaviors are 
under the control of nominal reward), but 
this does not appear to apply to the LFA 
behavior of Ss in Group 12-0; at least not 
in view of the SimCE which they ex- 
hibited. Frustration theory would seem to 
have the greater potential in this respect 
by appealing to the specificity of ry. Per- 
haps the critical feature in the 12-0 group 
is the absence of conditioned r, for one 
pellet prior to the reward shift. The Ss in 
Group 12-1 already had such an r, condi- 
tioned to certain stimuli in the maze situa- 
tion and could thus make use of it, at least 
in generalized form, in the MFA. But Ss in 
Group 12-0 were required to “start from 
scrateh" with the conditioning of this new 
r,. It may be assumed that ry — s; was 
experienced equally in the MFA by these 
groups. Thus, the slower initial Postshift 
speeds and slower adjustment to the ex- 
pected level of performance would be ac- 
counted for in Group 12-0 relative to 
Group 12-1. 

Perhaps the most likely possibility is 
that the greater SueCE found in Groups 
12-0 and 12-12 compared with 12-1 is not 
the result of *contrast effect" at all but of 
generalization decrement. A most reliable 
occurrence is reduced running speeds by 
rats contingent upon some change in the 
stimulus situation. The change in reward 
magnitude, especially to a magnitude not 
previously experienced, would constitute 
sueh a change. Although the SimCE could 
not be accounted for by generalization 
decrement, it is likely that some, if not all, 
of the SucCEs which have been measured 
may be due to this factor. Spear and Spitz- 
ner (1965) have cited several diverse 
sources of evidence which point to gen- 
eralization decrement as a primary con- 
tributor to the (operationally defined) suc- 
cessive depression effect. Surely, this factor 
cannot be discounted as a major source of 


CONTRAST EFFECTS IN SELECTIVE LEARNING 29 


the variance in the measured SueCE so 
long as a successive elation effect is not 
reliably demonstrated under comparable 
circumstances. 


GENERAL DISCUSSION 


This series of experiments has provided 
some answers relevant to the three general 
points mentioned in the introduction to this 
paper. Consideration of these points is 
given below. 


SimCE and SucCE Compared 


First, a rough comparison of the magni- 
tude of the SimCE and SucCE was possi- 
ble from Experiment IV. In Group 12-12 
these effects were not contaminated by 
prior experience with the Postshift reward 
and/or the SimCE. Group 12-0 had prior 
experienee with the SimCE but not with 
the Postshift reward. In general, it did not 
appear that the SucCE found in Groups 
19-12 and 12-0 was appreciably different 
than the SimCE found in Groups 12-1 and 
12-0. This is a surprising fact in view of 
the predictably greater SimCE due to more 
available comparisons of the contrasting 
rewards, lesser retention interval from one 
reward to the other, ete. For example, the 
results in terms of the Placed groups of 
Experiment I had suggested that the 
SimCE was the more robust phenomenon. 
It was noted in Experiment IV, however, 
that the more typieal quantitative simi- 
larity could be misleading since the SucCE 
may be due to factors other than those 
which contribute to SimCEs. For example, 
stimulus generalization decrement may 
contribute heavily to the former but 
robably not to the latter. 


Is the SucCE Ubiquitous in Its Effect on 
S’s Responses? 


_A second point raised by the introduc- 
ion was the question of the extent to 
which a CE associated with a given stim- 
ulus affects S’s responses to other stimuli. 
That is, to what extent does the CE share 
the ubiquitous character found in the par- 
ial reinforcement effect on extinction? 

In terms of committed speed immedi- 
ately following the shift, performance on 


the LFA was completely influenced by the 
reward decrement on the MFA—as com- 
mitted speed decreased on the MFA, an 
immediate and identical decrease occurred 
on the LFA. This fact is illustrated in 
Figure 17. It was established that the trial- 
by-trial decline in speed during the first 
Postshift day was greater for Ss in Group 
12-1; for the Groups x Trials interaction, 
F (2,112) = 9.43, p < .001. This is to be 
expected since reward was decreased only 
in Group 12-1, not in Group 1-1. Of greater 
importance was the apparent fact that the 
decline in LFA speed did not differ from 
that on the MFA; this was supported by 
the lack of a reliable interaction between 
Groups, Alternatives (MFA versus LFA), 
and Trials, F (2,112) = 1.87, p < .25. Thus, 
it appeared that a performance decrement 
on the LFA occurred which was parallel 
to that on the MFA, even though reward 
was reduced only on the MFA. Neverthe- 
less, it was possible that the uniform 
within-days decline in performance by 
Group 12-1 was not unique to the reward 
reduction on the MFA. To test this, the 
performance by Ss in Group 12-1 was 
compared within the last day of Preshift 
and during the first day of Postshift. The 
interaction between Days and Trials 
within Days, F (2,56) = 2.83, .05 < p < .10, 
reflected the lack of variation in perform- 
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Fic. 17. Performance within the first day of 
Postshift. Data from Experiments I and II are 
combined; each of Groups 1-1 and 12-1 includes 32 
Ss. 
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ance within the final Preshift day in eon- 
trast to the uniform decrement obtained 
after the MFA reward was decreased. More- 
over, the three-way interaction among 
Trials, Alternatives, and Days did not ap- 
proach statistical reliability; this supports 
the contention that the common within-day 
trend of MFA and LFA performance did 
not differ before and after the shift. 

"Therefore, it is concluded that following 
the reduction in MFA reward, the decrease 
in MFA committed speed is accompanied 
by an equivalent decrease in committed 
speed on the LFA—in spite of the fact that 
reward was not changed on the LFA. 

A corresponding effect was not obtained 
in terms of turning speed; the eventual 
MFA decrease was accompanied by ap- 
propriate LFA readjustment (increase). 
During the first day of Postshift, turning 
speed remained quite constant from trial 
to trial on both the LFA and MFA. This 
fact argues against a conclusion that 
several or all of S's responses, including 
some clearly distinct from that response 
directly instrumental to the altered re- 
ward, are affected by a single-reward 
change. Perhaps the common decline in 
LFA and MFA committed speed was due 
to the number of common stimulus ele- 
ments in these “committed” portions of the 
maze (the comparison available at the 
choice point probably reduces the impor- 
tance of this factor in terms of turning 
speed). It has been argued that this is not 
the case in the response-ubiquitous partial 


reinforcement effect (PRE) (Spear & 
Pavlik, (1966), but it must remain a possi- 
ble explanation of the present results until 
further tests are made. 


CE in Choice Behavior 


Finally, a CE in terms of choice be- 
havior never occurred. Three factors work 
against its occurrence. These factors include 
the spatial-temporal ordering of occurrence 
of the CE from the goal back toward the 
starting point (e.g., Spear & Spitzner, 1965; 
Vogel, Mikulka, & Spear, 1966) in combi- 
nation with the transient nature of the 
SucCE (Gonzales, Gleitman, & Bitter- 
man, 1962) and the change in LFA per- 
formance paralleling that on the MFA, 
when reward is shifted only on the MFA 
(see above). It is possible that, in any 
choice situation, the influence of the CE 
may be so weakened by the time the effect 
works its way back to the choice point 
that the influence on S's behavior there is 
relatively minor. Moreover, assuming pref- 
erence for the former LFA is contingent 
upon greater response strength to that 
side, it would appear that following the 
reward shift, either the SimCE must re- 
cover before the SucCE or the SucCE must 
have the greater effect, if a CE in choices 
is to be obtained. Neither of these events 
occurred in the present paradigm. Perhaps 
the extent of the effect of a reward shift 
on choice is eventual adjustment to the ex- 
pected (baseline) level of preference. 
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FRUSTRATION AND SECONDARY REINFORCEMENT CONCEPTS 
AS APPLIED TO HUMAN INSTRUMENTAL CONDITIONING 
AND EXTINCTION 
LANGDON E. LONGSTRETH* 


University of Southern California. 


This monograph reports 3 experiments concerned with the concepts of 
secondary reinforcement and frustration as they apply to human instru- 
mental conditioning, A review of the literature led to 2 conclusions: (a) 
there is little unequivocal evidence for secondary reinforcement at the 
human level, and (b) much of the data can be interpreted as supporting 
the extension of Amsel's frustration theory to human behavior. The 


present experiments explored these two 


preliminary conclusions. Experi- 


ments 1 and 2 paired 1 cue with reward (S+) and another with non- 
reward (S—), and then presented the cues alone, S+ to half the Ss and 
S— to the other half. S+ resulted in faster extinction as well as in other 
indications of greater frustration. There was no support for secondary 


reinforcement. Experiment 3 


investigated reinforcement schedule and 


nearness to goal as they affected speed, amplitude, and resistance to ex- 
tinction, The results once again failed to confirm secondary reinforcement 
predictions, and provided remarkable support for frustration theory. 


Te present paper reports three studies 
of human instrumental learning and 
extinction. It is conceptually concerned with 
the empirical validity of two quite different 
concepts as they apply to human behavior: 
secondary reinforcement and frustration. 
Amsel (1961) and the author (Longstreth, 
1964) have both drawn attention to possible 
confusions surrounding these two concepts 
and have wondered if one concept could not 
satisfactorily account for the data usually 
ascribed to both. The paper begins by ex- 
amining the nature of this confusion. 

The basic definition of a secondary rein- 
forcer (Sr) is well known: it is a stimulus 
configuration which, through association 
with a reinforcer, acquires the capacity to 
influence preceding behavior im a way simi- 
lar to that of the original reinforcer itself. 
In other words, responses followed by Sr 
are strengthened (learned) or at least main- 
tained at a higher level than responses fol- 
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lowed by neutral stimuli not previously 
paired with reinforcement. 

A number of refinements have been sug- 
gested from time to time, each attempting 
to specify more clearly those operations 
which are sufficient for Sr effects. While 
these refinements are not central to the pres- 
ent discussion, three prominent ones may 
be noted in passing. Taken in chronological 
order, the first is a discriminative stimulus 
hypothesis formulated by Schoenfeld, An- 
tonitus, and Bersh (1950). Attempting to 
rationalize their own failures to obtain Sr 
effects, they speculated that a stimulus 
must be a discriminative stimulus before it 
can function as an Sr. In view of its highly 
tentative nature (offered as a possible ex- 
planation of null results), it cannot be con- 
sidered a highly developed conceptualiza- 
tion. Several subsequent studies have not 
supported it (e.g, Ratner, 1956; Wycoff, 
Sidowski, & Chambliss, 1958). 

A more highly reasoned refinement is & 
discrimination hypothesis formulated by 
Bitterman and his associates (Bitterman, 
Feddersen, & Tyler, 1953; Elam, Tyler, & 
Bitterman, 1954). According to this posi- 
tion, a stimulus paired with reward during 
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acquisition and then presented alone in ex- 
tinetion serves to differentiate acquisition 
from extinction. To the extent the subject 
(S) diseriminates this change, extinction 
will be faster. Thus the opposite of Sr effects 
are predicted under certain conditions, and 
two experiments confirm the prediction. 

Finally, an information hypothesis has 
been suggested by Egger and Miller (1962). 
According to this notion, stimuli correlated 
with reward will become Sr's only if it is 
impossible to antieipate reward from other 
stimuli. In other words, if a stimulus is re- 
dundant in the sense that other stimuli have 
already "informed" S of impending reward, 
it should not acquire Sr properties. Although 
data supporting this hypothesis are reported 
by the authors, a more recent study fails to 
confirm it (McKeever & Forrin, 1966). 

It is to be noted that all these elabora- 
tions share the two operations of pairing the 
to-be Sr with an established reinforcer and 
then presenting it alone. Of primary impor- 
tance to the present discussion is the fact 
that a current theory of frustration specifies 
exactly the same operations as necessary for 
creating a frustrating situation. Reference 
is made here to Amsel’s well-known frustra- 
tion theory (Amsel, 1958, 1962). According 
to this position, frustration is defined as fol- 
lows: “...Frustrative events—the absence 
of or delay of a rewarding event in a situa- 
tion where it had been present previously 
[Amsel, 1958, p. 102]." According to this 
position, a frustrative event results in an 
unconditioned aversive emotional response 
(Rf), with much the same properties as 
fear. Just as components of the fear re- 
sponse can become attached to contiguous 
stimuli, thereby resulting in conditioned 
fear, so can components of Rf become con- 
ditioned, resulting in anticipatory (condi- 
tioned) frustration (rf). The implications 
of these assumptions have been verified in 
a number of investigations, and will not be 
reviewed here. 

Let us now contrast the implications of Sr 
theory and frustration theory with a hypo- 
thetical example. Assume that rats are ex- 
posed to a successive discrimination prob- 
lem: locomotion down a white (W) alley to 
a W goal box is reinforced, and down a 


black (B) alley to a B goal box is nonrein- 
forced. After a discrimination has devel- 
oped, extinction is introduced. Half the rats 
are run in W, half in B. Which group will 
extinguish first? According to the notion of 
secondary reinforcement (and note that the 
requirements of both the discriminative 
stimulus and information hypotheses are 
fulfilled by W), W is an Sr while B is not, 
Therefore Ss run in W should be more re- 
sistant to extinction. According to frustra- 
tion theory, exposure to the W goal box 
without reinforcement is frustrating, while 
exposure to the B goal box is not. The elici- 
tation of Rf, and the subsequent elicitation 
of rf in the alley, will serve to inhibit the 
instrumental response in W, much as an an- 
imal learns to inhibit responses leading to 
a fear CS (i.e., passive avoidance condition- 
ing). Thus Amsel's theory predicts faster 
extinetion in W. The previously noted stud- 
les by Bitterman and his associates are very 
similar to this hypothetical study and, as 
noted, the results do not support Sr predic- 
tions. As is typical of all the animal studies 
concerned with secondary reinforcement, no 
mention is made of the possible role of frus- 
tration. Mowrer (1960), however, has 
pointed out the relevance of frustration the- 
ory to these studies, noting that it very 
nicely accounts for some of the results. 
Both secondary reinforcement and frus- 
tration concepts have been applied to hu- 
man behavior as well as to animal behavior. 
As with the latter, studies with human Ss 
have been oriented towards either secondary. 
reinforcement, or frustration, but not both. 
Thus one line of studies has been cited as 
supporting Sr theory (a series of studies by 
Nancy and Jerome Myers are in this tradi- 
tion, a recent one appearing in 1965), while 
another group, smaller but growing, is in- 
terpreted in terms of frustration theory 
(e.g., Haner & Brown, 1955; Holton, 1961; 
Longstreth, 1960, 1965; Ryan, 1965). There 
is no conceptual overlap between these two 
groups of studies: the existence of a second 
concept with the same operational definition 
but contrary implications is all but ignored. 
Perhaps this is because most of these studies 
were not designed to simultaneously evalu- 
ate both concepts, but only one. Thus failure 
to confirm the experimental hypothesis did 
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not automatieally suggest the operation of 
an opposing concept. 

The present studies were designed to 
clearly differentiate between Sr and frustra- 
tion implications, so that the relative “truth 
value” of the two concepts could be deter- 
mined. Thus conditions were sought which 
led to clearly incompatible predictions, but 
which always involved the basie operation 
of pairing a stimulus with reward and then 
presenting it alone. The first two studies are 
similar to the hypothetical rat study pre- 
viously discussed, and the third study in- 
vestigates goal gradient and partial rein- 
forcement effects as they might be predicted 
from these two formulations. 


ExpERmMENT 1: DISCRIMINATION TRAINING 
FOLLOWED BY Extinction TO 8+ or S— 


Children were presented with a successive 
discrimination problem involving single, 
separate presentations of each of two visual 
stimuli (onset of lights varying in inten- 
sity). Turning off one of the lights resulted 
in a reinforcer (marble) while turning off 
the other light did not. Thus one of the 
lights was paired with reinforcement, as was 
the sound of the marble-ejection solenoid. 
A series of training trials was given, each 
trial consisting of presentation of one of the 
lights and its termination by S and subse- 
quent delivery of a marble following the ap- 
propriate light. The instrumental response 
consisted of pushing a joystick to the left 
to turn off one light and to the right to turn 
off the other light. Extinction was then in- 
troduced (no marbles). One group was pre- 
sented with both stimuli which previously 
had been paired with reward (the appro- 
priate light before each response and the 
sound of the solenoid afterwards), a second 
with just one (the appropriate light, with 
the solenoid turned off), and a third with 
neither (presentation of the light previously 
not paired with marbles). Resistance to ex- 
tinction was then determined, along with 
amplitude and latency measurements of 
each joystick response. 


Method 


Ss and apparatus. The Ss were 66 children from 
the second and third grades of the Manhattan 
Beach Public School System, Manhattan Beach, 


California. They were randomly assigned to three 
extinction groups of 22 children each, with 8, 10, 
and 11 males in the three groups. 

The apparatus consisted of three main units: a 
stimulus-response unit, a control unit, and a re- 
cording unit. The first unit was situated on one 
side of a gymnasium and the other two units be- 
hind a curtain at the other end of the room (ap- 
proximately 100 ft. away), so that the experimenter 
(E) could not only remain unobserved once the 
experiment started, but was also essentially incom- 
municado. 

The stimulus-response unit consisted of a large 
black box 44 in. high and 34 in. wide. It rested on 
a low table so that a 9-in. square milk glass window 
mounted on the front surface was about even with 
S's eyelevel. Directly below the window a joystick 
handle protruded which could be turned to the 
left or right. Springs returned it to a central posi- 
tion when pressure was released. To the right of 
the window and on the front was a clear plastic 
tube mounted in a vertical position. It received 
marbles which were automatically ejected into it 
on a programmed basis. The marbles rested on top 
of each other, thus filling up the tube as they accu- 
mulated. Its capacity was 40 marbles. 

The control unit was simply a programming unit 
for stimulus and reinforcement events. The record- 
ing unit was a two-channel ink-flow Offner Dyno- 
graph. One channel was connected to a potentiom- 
eter whose resistance was determined by amount 
of displacement of the joystick, thus providing a 
permanent record of response characteristics. 
Marker pens signaled onset and offset of illumina- 
tion of the milk glass window, as well as occurrence 
of marble delivery. It was thus possible to measure 
response amplitude and latency as well as number 
(frequency) of responses. Paper speed was 10 mm/ 
sec, allowing accurate time measurements to the 
nearest tenth of a second, Amplitude was measured 
in millimeters of pen deflection from a base line, 


Procedure 


Teachers sent Ss to the experimental room one 
at a time. They were greeted by E and informed 
that he had a marble game for them to play and 
that if they earned enough marbles, these could be 
traded for a prize. Attention was drawn to the 
stimulus-response unit and S was seated on a low 
chair in front of it. The milk glass window, pre- 
viously illuminated, was pointed out, and S was 
told the game consisted of turning it off whenever 
it came on by turning the response handle in the 
correct direction. The Æ demonstrated by turning 
off each of two illuminations twice. He then an- 
nounced that “sometimes” a marble would be 
ejected into the plastic tube when S turned off the 
light and that when the marbles reached a marker 
on the tube S could trade them for a prize. The 
marker, a piece of tape, was placed high enough 
on the tube so that 20 marbles were required to 
reach it. The S was told he could stop whenever 
he wanted to, and following these instructions E 
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went behind the curtain and initiated the first 
training trial. 

The experiment was programmed for two 
phases, training and extinction. During training, 36 
trials were presented, involving 18 presentations of 
each of two illuminations in the stimulus window, 
a dim and bright illumination. Illumination was 
measured by a Weston Master IV exposure meter 
held 1 ft. from the stimulus window. Indirect illu- 
mination with the stimulus window off was 0.3 ftc. 
Onset of the dim illumination (D) resulted in a 
reading of 3.1 fte., and onset of the bright (B) il- 
lumination resulted in a reading of 20 ftc. The 
lights, therefore, were easily discriminable. They 
were presented in a random order with the restric- 
tion that neither intensity appear more than three 
times in a row. A correct response terminated the 
illumination immediately, and the next intensity 
was automatically presented 4 sec. later. 

The instrumental joystick response was simple 
to make, involving about a 30-degree arc in the 
correct direction to turn off the light. The amount 
of pressure required was minimal in order to re- 
duce fatigue effects, but yet strong enough that the 
handle would return promptly to a central position 
when pressure was released. A pressure of 3 lb. 
moved it a sufficient distance in either direction. It 
could be rotated a maximum of about 90 degrees 
before a blocking device stopped further move- 
ment, thus making possible the measurement of 
variations in pressure. If S moved the handle in 
the wrong direction, he was allowed to correct his 
error before proceeding to the next trial. The Ss 
typically made only one or two errors at the very 
beginning of training. 

For all Ss, termination of one light (S+) always 
resulted in automatic ejection of a marble 2 sec. 
later. Termination of the other light (S—) never 
resulted in a marble. Within each subsequent ex- 
tinction group, direction of the correct response 
was counterbalanced with respect to light intensity, 
and light intensity was counterbalanced with re- 
spect to marble ejection. 

Following the 36 training trials, extinction was 
introduced with no interruption or warning. Two 
groups were presented only with S+, but a correct 
response was no longer followed by marble ejec- 
tion. One of these groups (S+n) was exposed to 
the noise of the marble solenoid 2 sec. after a cor- 
rect response, just as during training. This mecha- 
nism was turned off for the other group (S+). The 
third group (S—) was presented with only the neg- 
ative illumination. 

The experiment was terminated when S said he 
wanted to stop (almost invariably accompanied by 
rising from his chair) or when a 10-sec. time inter- 
val intervened between light onset and a response. 
If neither criterion had been reached by 150 ex- 
tinction trials, E terminated the experiment. In 
any event, S was told it was uncertain whether he 
had won a prize or not, and he would have to wait 
until everybody had played the game before the 
“winners” could be determined. At the termination 
of the study, every S was given his choice of sev- 
eral prizes. 


RESULTS AND DISCUSSION 


Light intensity and response direction per 
se exerted no significant effects on response 
strength, so that collapsing of these counter- 
balancing subgroups was justified. Ampli- 
tude measurements, however, were affected 
by direction of the preceding response; a 
second response in the same direction was 
weaker than a response in the opposite di- 
rection. In order to remove this source of 
variation from the training data, trials were 
chosen on which the preceding response was 
in the same direction. Pairs of such trials 
were then combined to smooth the resulting 
learning curves. The ordinal values of these 
trials varied for various counterbalancing 
subgroups, but were constant across the 
three extinction conditions. Three data 
ranges were picked in order to present per- 
formance near the beginning, middle, and 
end of training. Trials 2, 3, 4, and 8 were 
used to compute the first curve point, Trials 
13-16 for the second, and Trials 31, 34, 35, 
and 36 for the last point. 

The first and third vertical panels of Fig- 
ures 1 and 2 present amplitude and speed 
data (reciprocal of response latency), re- 
spectively, for these training trials, sepa- 
rately for each extinction condition as well 
as for all three conditions combined. Figure 
1 indicates a gradual divergence in ampli- 
tude over training trials, with amplitude to 
S+ increasing relative to S—. The reliabil- 
ity of this trend was evaluated by a mixed 
three-way analysis of variance (Lindquist 
Type VI) with extinction condition as 8 
between-S variable and trials and stimuli 
(S+ versus S—) as within-S variables. The 
only F ratio approaching significance was 
the trials-by-stimuli interaction, with an F 
of 7.94 (df — 2/126, p < .001). It may thus 
be concluded that amplitude to S+ in- 
creased relative to S— amplitude over train- 
ing trials, and that all three subsequent ex- 
tinction groups manifested the same trend. 

Speed data (Figure 2) present a similar 
picture except that the S+, S— divergence 
is not so pronounced. Analysis of variance 
indicated that while speeds to S+ were sig- 
nificantly faster than speeds to S— (p < 
001), the divergence was not significant. 
The first trial of the pair of trials deter- 
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mining the first curve point was then em- 
ployed by itself to determine the first point, 
under the hypothesis that learning was ex- 
tremely rapid in this situation and that data 
more representative of the beginning of 
learning would more likely reveal a diver- 
gence if any, in fact, existed. Single-trial 
data were also employed to determine the 
other two eurve points in order that distri- 
butions with comparable variance be pro- 
vided at all three curve points. An inspec- 
tion of the resulting curves revealed an 
increased amount of divergence from that 
depicted in Figure 2 for all three groups, and 
statistical analysis confirmed the reliability 
of this trend by yielding a significant trials- 
by-stimuli interaction (p < .05). It may 
thus be concluded that the marble was in- 
deed a reinforcer: responses followed by it 
developed greater strength and speed than 
responses not followed by it. 

The second and fourth vertical panels of 
Figures 1 and 2 present amplitude and speed 
measurements during extinction. Amplitude 
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is presented for the second extinction trial 
and for successive fifths of extinction. The 
first extinction trial is not represented be- 
cause it involved a change in response direc- 
tion from the preceding trial for half the Ss. 
Speed data are presented only for the first 
six extinction trials because the event- 
marker pens were turned off at that point, 
making further speed measurements impos- 
sible. 

Amplitude data reveal an approximately 
linear decrease over extinction trials, with 
about the same slope for all three condi- 
tions, and with greater amplitude to S+ 
than to S—. Analysis of variance indicated 
only the trials effect to be significant (p< 
001). Speed data reveal a large drop from 
the first to the second extinction trial for 
Condition S+n, a smaller drop for Condition 
S+, and no drop at all for Condition S—. 
The decrements for Conditions S-Fn and 
S+ were significant as measured by related 
t tests (p < .05). Variances of the difference 
scores between the first two extinction trials 
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Fig, 1. Mean response amplitude in training and extinction for the three extinction con- 
ditions separately and for all conditions combined. 
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Fic. 2. Mean response speed in training and extinction for the three extinction conditions 
separately and for all conditions combined. 


were also different, being 398, 716, and 246 
for Conditions S+n, S+, and S—, respec- 
tively (p < .05, Hartley test). These differ- 
ences were maintained through Extinction 
Trial 6, at which point speed variances were 
1,866, 752, and 452 for Conditions S+n, 
S+, and S— (F max. = 4.13, p < .01). Dif- 
ferences in mean speed for Extinction Trials 
2-6 were not significantly different for the 
three conditions. 

Table 1 presents the number of Ss in each 
group who responded less than 50, 100, or 
150 times during extinction, as well as the 


TABLE 1 


DISTRIBUTION OF Ss FOR NUMBER OF RESPONSES 
TO EXTINCTION (n) 


Condition 
P S+n S+ E 
1504- 4 12 16 
100-149 2 2 à 
50-99 6 2 2 
0-49 10 6 3 


number who responded more than 150 times. 
All Ss but one of those who extinguished 
prior to 150 responses met the eriterion of 
verbalizing a wish to stop, while the one ad- 
ditional S met the criterion of a 10-sec. re- 
sponse latency. These frequencies clearly 
indicate least resistance to extinction for 
Condition S--n and greatest resistance to 
extinction for Condition S—. Evaluation of 
this trend was carried out by dividing each 
group into those above and below the ap- 
proximate overall median (150) and com- 
puting chi-square. A value of 13.56 was 
obtained (df = 2, p < .005). Differences be- 
tween S--n and each of the other conditions 
were also significant, while the difference 
between S+ and S— was not significant. 
These findings do not support predictions 
based upon secondary reinforcement theory, 
which predicts the opposite ordering of 
groups in terms of resistance to extinction. 
Considering frustration theory, the follow- 
ing interpretation is patterned closely after 
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Amsel's use of the theory with infrahuman 
data. 

Condition S--n maximized the similarity 
of the extinction condition to the reinforced 
training condition, and thereby elicited the 
strongest reward anticipations (indeed, a 
number of Ss were observed to look at the 
marble ejection tube when the delivery sole- 
noid was activated). Not receiving the re- 
ward, this condition elicited maximum frus- 
tration responses. The aversive nature of 
these responses elicited avoidant response 
tendencies which then conflicted with subse- 
quent instrumental response tendencies. If 
the instrumental response then occurred, 
however, it should haye been “amplified” 
by the drive properties of frustration. Even- 
tually, continued exposure to nonreinforce- 
ment strengthened the avoidance response 
tendencies above those of the instrumental 
response, and extinction occurred. The same 
processes operated to a lesser extent in Con- 
dition S+, and least in Condition S—. 

How well do the facts fit the theory? The 
extinction data, of course, are in the pre- 
dicted order, although the difference be- 
tween Conditions S+ and S— is not signifi- 
cant (Experiment 2 follows up this matter). 
The predicted conflict in Condition S+n 
(and, to a lesser extent, in Condition S+) is 
also supported in terms of speed means and 
speed variability. That is, speed scores are 
conventionally used to infer conflict (i.e. 
Castaneda & Worrel, 1961; Finger, 1941), 
the assumption being that strong conflict re- 
sults in temporary response blockage, and 
hence in long latencies. Condition S+n pro- 
duced the greatest drop in starting speed 
following the first extinction trial, and Con- 
dition S— the least, thus confirming the 
prediction, Speed variability of early ex- 
tinction trials was also greater for Condi- 
ions S+n and S+. Amsel has interpreted 
such variability to be indicative of the wax- 
ing and waning of the conflicting response 
endencies, with momentary dominance of 
the instrumental response producing a short- 
atency response, and momentary domi- 
nance of the avoidance response tendency 
producing a long-latency response. Thus 
speed variances as well as speed means con- 
orm to expectations. 

The heightened drive prediction, how- 


ever, was not confirmed: response ampli- 
tude following the first extinction trial was 
not affected by extinction conditions, and 
hence was not a function of frustration. Ex- 
periment 2 also investigated this matter in 
greater detail. 


EXPERIMENT 2: A PSEUDOREPLICATION 
OF EXPERIMENT 1 


The most impressive finding of Experi- 
ment 1 was the rapid extinction under Con- 
dition S--n, the condition providing both 
stimuli previously paired with marble re- 
inforcement. Although this finding contra- 
dicts secondary reinforcement theory and 
supports Amsel’s frustration theory, the 
lack of a significant difference between Con- 
ditions S+ and S— is disturbing to both 
positions. Experiment 2 had as one goal a 
second estimate of the population difference 
between these two conditions. It thus con- 
sisted of two groups, S+ and S—. A pre- 
training modification was suggested by the 
possibility that light offset itself was rein- 
forcing and hence masked Sr effects attrib- 
utable to the marble. Although the greater 
amplitude and speed scores to stimulus S4- 
during training argues against this possibil- 
ity, it was nevertheless decided to reduce 
these possible reinforcing effects by “satiat- 
ing" S prior to the experiment proper. To 
this end a pretraining phase was added, con- 
sisting of 120 trials (60 B and 60 D), S ter- 
minating each light by the appropriate joy- 
stick response. It was assumed that the 
novelty of the “game” would be reduced 
substantially by this exposure. 

Two other modifications had to do with 
the failure to find increased response ampli- 
tude following initial extinction trials under 
Conditions S--n and S+, that is, failure to 
observe Amsel's FE. First, a number of Ss 
were responding with maximum amplitude 
prior to extinction, since very little effort 
was required to turn the handle as far as it 
would go. For these Ss, a ceiling effect was 
therefore operating, making it impossible 
for amplitude to increase with the onset of 
extinction (separate analyses of Ss not re- 
sponding at maximum still failed to show 
the FE, but the N was quite small). Tension 
on the response handle was therefore in- 
creased to the point where it was rather 
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diffieult to turn it a maximum distance. 
Second, it is possible that the FE was mini- 
mized by the temporal interval between 
trials (4 sec.). Perhaps the postulated aver- 
sive motivation produced by nonreinforce- 
ment dissipated prior to the next response. 
The duration of the intertrial interval was 
thus introduced as a second independent 
variable in Experiment 2, retaining a value 
of 4 sec. for half the Ss and assuming a 
value of 2 sec. for remaining Ss. 

Two final modifications were introduced. 
First, a more complete picture of extinction 
behavior was obtained by prolonging its 
possible duration and securing speed as well 
as amplitude measurements on all extinc- 
tion responses. Second, Ss were drawn from 
a population of institutionalized mental re- 
tardates rather than from a public school 
(see Footnote 1). 


Meruop 


Ss and Apparatus 


Thirty-two mental retardates served as Ss. All 
were from Clover Bottom Hospital and School, 
Doneldson, Tennessee, a state institution for the 
mentally retarded. They were assigned in prear- 
ranged order to one of eight conditions, the second 
16 Ss being assigned in the same order as the first 
16. Sex, CA, and MA were ignored, except that Ss 
so profoundly retarded that they could not follow 
instructions were not used. Mean CA for the two 
conditions, S+ and S—, was 21.6 and 21.1 yr., re- 
spectively, while mean MA was 6.6 and 7.4 yr., re- 
spectively. Neither of these differences approached 
significance. The number of females in conditions 
S+ and S— were eight and four, respectively. Since 
there was a nonsignificant tendency for females to 
extinguish more slowly than males, and since it was 
found that Condition S-- resulted in faster extinc- 
tion, sex differences were ignored. 

The apparatus was the same as used in Experi- 
ment 1. The stimulus-response unit was placed in 
a room adjacent to E's room, the two being sep- 
arated by a partition holding a one-way window. 
The control and recording units were in E's room 
and were connected to the stimulus-response unit 
by means of an electrical conduit. The springs used 
to maintain tension on the response handle were 
replaced so that about 10 Ib. pressure in either di- 
rection was required to rotate the handle its maxi- 
mum distance. 


Procedure 


An assistant brought Ss to the research building 
one at a time. After S was informed that there was 
a game for him to play, he was shown the stimulus- 
response unit and told that the game consisted of 


turning off the light by turning the joystick in the 
correct direction. After two demonstrations with 
each light by E, the pretraining phase was insti- 
tuted, consisting of 60 presentations each of B and 
D in a mixed order, the second sequence of 60 
trials being identical to the first sequence. Except 
for three "non-offset" trials, a correct response ter- 
minated the light immediately and the next light 
was automatically presented 2 or 4 seconds later, 
depending upon the intertrial interval condition to 
which S belonged. All Ss but one were responding 
correctly by the end of pretraining, and that S was 
replaced. 

On Trials 70, 80, and 100 a correct response did 
not result in immediate offset of the light. Instead, 
the response had to be repeated two more times 
before the light terminated. The purpose of these 
trials was simply to investigate the effects of this 
kind of treatment. 

Following pretraining E entered the room and 
announced that he had a new game. The S was 
told to continue turning off the lights, but that 
"sometimes" a marble would fall down the tube 
into a cardboard box placed directly below the 
tube. At this point E produced a steel container 
completely full of 30 steel marbles. After comment- 
ing on how full the container was and making sure 
that S saw it, E put the marbles, one by one, into 
the top of the apparatus and out of sight. Then he 
placed the empty steel container beside the box at 
the end of the tube and told S to put the marbles 
back into the container whenever one fell into the 
box. He was told that if he could fill the container 
with marbles “just like it was before,” he could win 
a prize. At this point a nickel and a 5-cent bag of 
M & M candy were produced and S was asked 
which he would rather win. His choice was placed 
next to the empty steel container. After telling him 
he could quit whenever he wanted to by getting 
up and coming into the next room where E was 
waiting, E returned to the control room and ini- 
tiated the training phase. 

The preceding change in incentive conditions 
from Experiment 1 was to insure that these re- 
tarded Ss would notice the pairing of S-- and mar- 
bles; thus they were forced to handle each marble 
instead of perhaps looking at it in the tube. 

Training consisted of 18 presentations of each 
light with the same mixed sequence for all Ss, just 
as in Experiment 1. Ss in the 4-sec. intertrial in- 
terval condition received the marble two sec. after 
responding to S+, and Ss in the 2-sec. condition 
received the marble 1 sec. after a response to S+. 

Extinction followed training with no interrup- 
tion. Half the Ss were exposed only to S—, and 
the other half were exposed to the following se- 
quence: S+, S+, S—, S—, S+, etc., that is, two 
presentations of each stimulus followed only by 
S+. If S had not actually walked out of the room 
by 54 extinction trials, E entered the room and 
said, “Do you want to quit or play some more?” 
If there were any questions E said, “You can play 
as long as you like.” If S indicated he wanted to 
continue, Æ returned to the control room. This 
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process was repeated at the end of 154 trials if S 
had not extinguished. If S did not quit at that 
point, he was allowed to continue uninterrupted 
until 750 trials had occurred (N = 2) or until the 
next S arrived (N = 2). The artificial stopping of 
these four Ss was, of course, taken into account in 
the analysis. The purpose of the interruptions de- 
mands some explanation. It was felt that making 
it “easy” to stop served two purposes. First, it 
counteracted any tendency these Ss may have 
learned in the institution to obey orders, a tend- 
ency which seemed predominant at this institu- 
tion. Second, it served to reduce the possibility 
that S would either not notice or forget the fact 
that it was acceptable behavior to stop respond- 
ing when he wished. 

The three factorially combined variables, then, 
were light-marble condition (B+ or D+), inter- 
trial interval, and reinforcement interval (4 and 2 
sec., respectively, for one condition, and 2 and 1 
sec., respectively, for the other condition), and ex- 
tinction condition (S+ or 8—). There were four 
Ss in each of the eight cells. 


RESULTS 


The overall median number of responses 
to extinction for all 82 Ss was 60. The me- 
dians for extinction Conditions S+ and 8— 
were 38 and 117, respectively, yielding the 
contingency relationship described in Table 
2. Chi-square, corrected for noncontinuity, 
equaled 4.05, df = 1, p < .05, indicating 
faster extinction under Condition S+. The 
four Ss stopped artificially did not affect 
these values since they all responded more 
than 400 times (three were in Condition 
S—). The direction of the difference ob- 
served in Experiment 1 between these two 
conditions was therefore replicated, this 
time reliably. 

Experiment 2, like Experiment 1, thus did 
not yield Sr effects, but rather the opposite. 
While these results are consistent with frus- 
tration theory, more demanding evaluations 
can be made by examining amplitude and 
speed data, as in Experiment 1. Compared 


TABLE 2 
NuwmzR or Ss 1n Conprrions S+ AND 8— 
ABOVE AND BELOW THE (GENERAL MEDIAN 
NUMBER or Responses TO EXTINCTION 


Median number of Condition 
Tesponses to extinction S+ s— 
604- 13 3 
59— 3 13 
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Fic. 3. Mean response speed (100/t sec.) in 
training (N) and extinction (n) for the two ex- 
tinction conditions, S+ and S—. 


to S—, the S+ condition would be expected 
to (a) produce a greater decrement in start- 
ing speeds after the first extinction trial; 
(b) produce greater speed variability on 
early extinction trials; and (c) produce an 
increment in response amplitude, followed 
by a decrement on later extinction trials. 
Figure 3 presents mean response speeds 
during training and the first 19 extinction 
trials, and Figure 4 presents amplitude 
data2 The intertrial interval (ITI) condi- 
tions of 2 and 4 sec. are combined in these 
figures, even though the 2-sec. interval re- 
sulted in consistently greater frustration ef- 
fects as subsequently described. The train- 
ing data are similar to those of Experiment 
1: both speed and amplitude curves indicate 
discrimination between S+ and S—, with 


? Extinetion data in Figures 3 and 4 are limited. 
to 19 trials because the first S in Condition S— 
extinguished at that point. Two Ss in Condition 
S+ who extinguished prior to 19 trials are ex- 
cluded in order to preserve the continuity from 
training to extinction. The nature of the curves 
is not altered by exclusion of these data. 
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Fic. 4. Mean response amplitude (in milli- 
meters) in training (N) and extinction (n) for the 
two extinction conditions, S+ and S—. 


greater speeds and amplitude to S+. The di- 
vergence in the speed curves was not signifi- 
cant unless pretraining trials were included 
(represented by “pre” in Figure 3), confirm- 
ing the results of Experiment 1 in suggest- 
ing that speed is a more sensitive index of 
discrimination than amplitude in the pres- 
ent situation. 

Turning next to response speeds in extine- 
tion, Figure 3 indicates that the onset of ex- 
tinction resulted in a greater decrement in 
Condition S+ than in S—, as predicted. This 
trend was evaluated by calculating speed 
differences between the first two extinction 
trials and performing a ¢ test on the two 
resulting means. The difference of these dif- 
ferences was significant (p < .05), provid- 
ing statistical support for the prediction and 
confirming the results of Experiment 1. 

An unexpected finding is the gradual in- 
crease in speed during extinction for Condi- 
tion S—. This increase from the end of 
training to Extinction Trial 19 is signifi- 
cantly greater than the corresponding 


change in Condition S+ (p < .05). Indeed, 
by the nineteenth extinction trial Ss in Con- 
dition S— were responding with a speed 
equal to that elicited by the S+ stimulus at 
the end of training. This pattern persisted 
throughout extinction, in spite of the 
greater fatigue effects under Condition S—, 
where a greater number of responses oc- 
curred. 

Speed variability may be considered next. 
As the irregular extinction curve in the top 
half of Figure 3 suggests, Condition S+ pro- 
duced greater variability than Condition 
S—. Evaluation of this difference necessi- 
tated the matching of Conditions S+ and 
S— as closely as possible on the basis of 
mean speeds, due to a positive correlation 
between means and variances. Such match- 
ing was possible on three extinction trials: 
6, 9, and 18. The means and variances for 
the two conditions on these trials are pre- 
sented in Table 3. In spite of the fact that 
mean speeds are a little larger for Condition 
S—, thus biasing variances in the direction 
of greater S— variability, the table shows 
larger variances for Condition S+ in all 
three comparisons, and in two the difference 
is significant. Speed variances, therefore, 
conform to expectations, suggesting greater 
conflict in Condition 8+. 

Finally, response amplitude may be ex- 
amined. Figure 4 shows that amplitude dur- 
ing early extinction was considerably dif- 
ferent for the two conditions, with S+ 
showing a large increment from the first to 
the second extinction trial, and S— a slight 
decrement. The S+ increment is significant 
as measured by related t, t = 2.41, df = 13, 
p < .05, while the decrement for S— is not 


TABLE 3 


Spreen Means (X) AND VARIANCES (8?) ON 
EXTINCTION TRIALS (n) FOR CONDITIONS 


S+ ann S— 
Condition 
n S+ s- 
x st z s? 
6 131 8,195 136 2,783* 
9 141 13,541 146 2,894* 
18 155 19,775 164 11,803 
*p < 05. 


FRUSTRATION AND SECONDARY REINFORCEMENT 11 


significant. The third and fourth extinction 
trials for Condition S+ consisted of two 
presentations of S— and are represented by 
a dotted line. The first presentation of S— 
indieates an increment in amplitude from 
the end of training, while a decrement is in- 
dicated for the third extinction trial in Con- 
dition S—. These differences between the 
two groups are significant: when Ss were 
matched on the basis of amplitude to S— at 
he end of training, and using all 16 Ss in 
Condition S--, Condition S4- resulted ina 
larger increment from the end of training to 
Extinction Trial 3 than Condition S—, re- 
ated t = 2.11, df = 15, p < .05. This sta- 
tistic leads to the conclusion that two pres- 
entations of S-- without reward led to a 
greater inerement in amplitude to S— than 
two preceding presentations of S—. The im- 
nortance of this conelusion is discussed 
shortly. 

Amplitude from Extinction Trial 5 shows 
a further increment for three trials under 
Condition S+, and then a gradual decline, 
whereas a monotonic decrement throughout 
extinetion is depicted in Condition S—. It 
may be concluded, therefore, that response 
amplitude in Condition S+ is distributed 
across extinction trials as an inverse-U 
function, as predicted. When the data of the 
2- and 4-sec. ITI conditions are examined 
separately, the inverse-U function appears 
in both conditions, but is larger in the 2-sec. 
condition. It may thus tentatively be con- 
cluded that part of the reason it was not ob- 
served in Experiment 1 was due to the inter- 
trial interval. 


Discussion 


Several additional comments about Ex- 
periment 2 need to be made. First, although 
increased amplitude following nonreward is 
often interpreted as indicative of the nondi- 
rective motivational properties of frustra- 
tion, an associative explanation (e.g., 
Brown, 1961) is not ruled out. That is, it 
could be argued that 5 preexperimentally 
learned to try harder following failure, and 
hence the increased amplitude is a habit 
phenomenon rather than a motivational 
one. If such were the case, one might not ex- 
pect an increase in amplitude to S—, since 


the response to this cue never resulted in 
“failure.” Yet an increase was observed on 
Extinction Trial 3, where S— was presented 
following two previous nonreinforced oceur- 
rences of S+. Furthermore, this increase 
was about twice as great with a 2-sec. inter- 
trial interval as with a 4-sec. interval. Both 
findings are more consistent with a motiva- 
tional interpretation than with an associa- 
tive interpretation. 

Second, the decrement in speed under 
Condition S+ may not necessarily have 
been due to frustration-mediated avoidance 
tendencies. It was observed that Ss often 
oriented their faces toward the cardboard 
marble container after responding to S+, 
and maintained this orientation until the 
marble was delivered. In extinction, this 
posture was often maintained until the next 
presentation of S+, at which time Ss again 
looked at the stimulus window and made 
their response. Thus they often were not 
looking at the window when the stimulus 
was presented. Perhaps, then, the decrement 
in speed was the result of orienting re- 
sponses which interfered with performance 
of the joystick response. Fortunately, it was 
possible to test this argument. It will be re- 
called that a different type of “nonrein- 
forcement” was encountered on three pre- 
training trials: the stimulus light did not 
immediately terminate with a correct re- 
sponse. The important point is that the lo- 
cus of “nonreinforeement” on these trials 
was identical to the locus of stimulus pres- 
entation; that is, the stimulus window. 
Hence orienting responses elicited by non- 
reinforcement would be expected to facili- 
tate speed of the next response, rather than 
to interfere with it. However, such was not 
the case. Mean speeds of one trial preceding 
and one trial following the first nonoffset 
trial (Trials 69 and 71) were 143 and 110, 
respectively, yielding a related £ of 2.09, 
df = 24, p < .05 (seven Ss who made an er- 
ror on one of the two trials were discarded). 
This decrement of 33 units is similar to the 
decrement from the end of training to the 
second extinction trial for Condition S+ (27 
units) and negates an interpretation based 
on changes in orienting responses. 

Third, and last, the gradual increase in 
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speed to S— during extinction is puzzling 
but not entirely baffling. It may represent an 
inereasingly strong expectation for S+, 
since Ss presumably learned during pre- 
training and training that S— was always 
followed sooner or later by S+. It is signifi- 
cant that speed showed a final decrement on 
the last two-fifths of extinction, suggesting 
that these expectations finally weakened, 
contributing to extinction. 


Experiment 3: Tue Errects or REIN- 
FORCEMENT SCHEDULE AND EXTINCTION 
Brock POINT ON INSTRUMENTAL 
BEHAVIOR 


Experiments 1 and 2, taken as a whole, 
provide strong support for Amsel's frustra- 
tion theory as it might be applied to human 
behavior. As far as secondary reinforce- 
ment is concerned, one may conclude that 
at best it was not demonstrated and at 
worst it was disconfirmed. But perhaps the 
paradigm and specific procedures used in 
these studies produced idiosyncratic results. 
It was therefore decided to approach the 
problem from a different point of view. The 
strategy was the same: to consider implica- 
tions of frustration and secondary rein- 
forcement theory and to oppose them, if 
possible, in a single experiment. 

Amsel has used his theory to explain both 
training and extinction data obtained from 
a partial reinforcement schedule. The perti- 
nent facts are these: in training, partial 
reinforcement results in (a) initially slower 
speed, (b) final higher speeds (if more than 
30-40 trials are given), and during extinc- 
tion in (c) greater resistance to extinction. 
'There is some evidence that (a) and (b) are 
dependent upon where measurements are 
taken in the response sequence; speeds re- 
corded in the early or middle parts of the 
locomotor sequence confirm these findings, 
while measurements at the end of the se- 
quence, for example, in the goal area, sug- 
gest faster speeds for 100% reinforcement 
throughout training; that is, such measure- 
ments support (a) but not (b) (Amsel, 
1964; Wagner, 1961). It was decided to test 
Amsel's formulation at the human level by 
attempting to reproduce these findings. The 
experimental design thus ineluded two main 


conditions, a 100% and a partial reinforce- 
ment schedule. In order to investigate the 
dependency of (b) upon the particular re- 
sponse segment measured, the instrumental 
sequence was divided into four discrete 
parts, with amplitude and speed measured 
independently for each one. More specifi- 
cally, a trial consisted of successive presen- 
tations of four light intensities, progressing 
from B to D for half the Ss and from D to 
B for the other half. Just as a rat “termi- 
nates” each segment of the alley with the 
same locomotor response, so was each light 
intensity terminated with the same joy- 
stick response: a movement to the left. A 
marble followed the last response, and then 
the first light in the sequence was presented 
again. Half the Ss obtained a marble after 
every sequence, and half after only some 
sequences, as subsequently described. 

Secondary reinforcement implications in 
a sequential task were first investigated at 
the human level by Lambert, Lambert, & 
Watson, (1953). Children learned to turn 
a crank and to insert tokens in order to 
earn candy at the end of the instrumental 
sequence. They were then extinguished at 
different “block points” from the “goal”; 
some Ss earned tokens and were exposed to 
other stimuli in the sequence while other 
Ss did not; they turned the crank but pro- 
gressed no further. The reasoning was that 
Ss extinguished near to the goal would be 
exposed to secondary reinforcement from 
the cues at the end of the sequence, and thus 
would be more resistant to extinction than 
Ss extinguished further from the goal. The 
results, unfortunately, were exactly the op- 
posite, and dramatically so: Ss extinguished 
near the end of the sequence extinguished 
significantly faster, and with practically no 
overlap with Ss extinguished near the be- 
ginning of the sequence. A second study, 
with some changes to control for an "un- 
controlled” variable, found opposite results, 
but the differences were not nearly so dra- 
matic. 

It was decided to repeat the essential 
features of the Lambert et al. study, since 
such data are obviously relevant to both 
frustration and secondary reinforcement 
concepts. Therefore one extinction condi- 
tion (1) involved presentation of only the 


E—————— 


| 


Frustration AND SECONDARY REINFORCEMENT 


first light in the sequence (S1), and this 
light was presented over and over until S 
extinguished. A second condition (4) in- 
volved presentation of only the last light in 
the sequence (S4), and a third condition 
(1234) involved presentation of the entire 
sequence (S1-S4) until S extinguished. A 
frustration point of view would predict 
greatest frustration for Conditions 1234 and 
4, since these conditions are more similar to 
the reinforcement situation than is Condi- 
tion 1. Contrary to Sr predictions, these con- 
ditions would thus be expected to produce 
faster extinction and to produce other indi- 
cations of frustration as well: temporary 
increases in amplitude, decrements in speed, 
and greater speed variability. 

The experimental design was thus a 2 X 
2 x 3 factorial one, consisting of reinforce- 
ment schedule (100% or partial), light se- 
quence (B to D or D to B), and extinction 
block point Conditions 1, 4, or 1234. 


METHOD 


Ss and Apparatus 


Sixty mental retardates from Clover Bottom 
Hospital served as Ss. None had been exposed to 
the apparatus before. They were exposed to 1 
of the 12 conditions of the experiment in the 
following way: the first 2 served in Condition 1; 
the next 2 in Condition 2; etc.; until four Ss 
had served in each condition. Twelve remaining 
Ss were then assigned 1 to each condition, yield- 
ing a total of 60 Ss. Hight others were rejected and 
replaced, four because of apparatus failure, two 
because of extreme slowness in motor behavior, 
one because of epilepsy, and one because of poor 
eyesight. 


As in Experiment 2, subject factors were ignored 
in the selection of these Ss. Table 4 presents 
means and SDs of CA, MA, and IQ for the six 
major conditions of the experiment (counterbal- 
ancing of light intensity and response sequence 
is not included), as well as sex distribution. It 
indicates that MA and IQ were highly similar for 
all six conditions, and that CA was similar for five 
of the conditions, but was only 19.2 for the sixth 
condition. Analysis of variance, however, indicated 
that CA differences were not significant for either 
experimental variable depicted in Table 4, nor for 
their interaction. 

As a final check on the possible role of MA, 
four Ss were selected from each of the six cells in 
Table 4, one pair representing the highest MAs 
and the other pair the lowest MAs in that condi- 
tion, Two groups of 12 were thus formed, repre- 
senting the extremes in MA, with averages of 
52 and 86 years. Mean number of responses to 
extinction for these two groups were 174 and 178, 
respectively, a difference too small to merit sta- 
tistical evaluation (the ranges were 12-750 and 
6-750). 

Sex was not evenly distributed in the six con- 
ditions, although marginal totals reveal that it 
was evenly distributed for either experimental 
variable. The mean number of responses to ex- 
tinction for the 31 males was 200, and the 29 
females, 212, a difference not approaching signifi- 
cance. 

The apparatus was exactly the same as used in 
Experiment 2, except that a modification in the 
incentive conditions was instituted. These condi- 
tions are subsequently described. 


Procedure 


As E and S entered the experimental room, E 
said he had a present for S. Inside the room a 
large, heavy Christmas package was sitting on a 
chair, attractively wrapped in gay colors and 
tied with a pretty ribbon. E handed it to S, tell- 
ing him it was his. As S took it (and thus felt how 


TABLE 4 
Mrans (X) AnD STANDARD DEVIATIONS (SD) or CA, MA, AND IQ, AND SEx DISTRIBUTION, 


ron REINFORCEMENT SCHEDULE (76 


REINFORCEMENT) AND EXTINCTION Brock Point 


Extinction block point 
Kinotase ot 1234 4 1 
CA MA 1Q MA IQ CA MA IQ 
100% 
x 26.4 7.0 45.6 23.0 6.0 42.3 258 6.5 429 
SD PIS 0] gu Hike) 73 09 79 79 08 73 
Bex 7M,3F 6M4F 3M,7F 
Y 67 46.7 25.2 64. 43.2 
237 71 46.8 1.2 6: i 4 à 
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heavy it was), E pulled it away and added, “Be- 
fore you can open it, I have something for you to 
do.” An 8-ft. board was sitting on two chairs 
beside the stimulus-response unit. It had 63 holes 
countersunk in a wavy line to form retainers for 
marbles, and with a colored line leading from one 
hole to the next, and with two lines after the last 
hole forming the outline of an arrowhead. The 
present was placed at the end of the board, so 
that the arrowhead pointed directly at it. As E 
placed the present on the board he said, “You have 
to fill all these holes with marbles, way down 
here to the very last one, before you can have 
the present. Let’s put it here behind the last 
hole. Now, sce this light here in the window? (The 
stimulus window had been previously turned on.) 
Well, you have to turn it off whenever it comes 
on; as soon as it comes on you turn it off. Then 
wait for it to come on again and then turn it off 
again. You turn it off by pushing this lever here. 
Watch. (E turned the lever to the left, turning off 
the light; when it automatically came on 2 sec. 
later, he turned it off again.) Now I’m going to 
tell you something else. Sometimes when you turn 
off the light, a marble will pop out of this tube 
and land in this box here. When it does, pick it up 
and put it in the next hole, and when you have 
them all filled up, you can have your present. 
Now you turn off the light to see if you can do it. 
(After S's second response a marble was auto- 
matically ejected. The # uttered an exclamation 
of delight, told S to pick it up and place it in the 
first hole, which S did.) Now, when you are all 
finished, or whenever you want to quit, you just 
get up and come out the door to my room down 
the hall and tell me—T'll be doing some work in 
there. When you want to quit, come and tell me. 
Go ahead and begin now.” 

The Æ then remained in the room with S, para- 
phrasing and repeating the above instructions un- 
til it was obvious that S understood the nature of 
the task. He remained in the room until S was 
responding with less than maximum pressure on 
the joystick, but under no conditions remained 
after the first 30 trials: all responses after the thir- 
tieth trial were performed alone. If S was re- 
sponding habitually with maximum pressure, thus 
allowing no margin for variation in amplitude, 
E said, “You do not have to push that hard; it 
is easier if you push softly," and demonstrated by 
putting his hand over S's on the response handle 
and turning the handle softly to the left. When 
E later returned to the control room he marked 
the recording paper with a pencil to indicate the 
specific trials he was in the room with S. Some- 
times he went back two or three times within the 
first 30 trials to remind S once again that he did 
not have to “push so hard." It is to be noted that 
these instructions would tend to minimize the FE, 
and thus operated against frustration predictions. 

Training. Training consisted of 61 trials of the 
S1-S4 sequence. Half the Ss received a marble at 
the end of each sequence (10076 reinforcement) 
and half at the end of 33 of the sequences, de- 


fining the partial reinforcement condition (54%). 
The Ss in the latter condition found 28 marbles 
in the board at the beginning of the experiment. 
Thus all Ss possessed 61 marbles at the end of 
training and needed 2 more to fill the board. The 
reinforcement schedule for the 54% condition con- 
sisted of four cycles of a 15-trial pattern and one 
final reinforcement. The reinforcement schedule 
for each 15-trial pattern was +, —, t, —, —, —, 
+, +, —, +, +, +, —, —, +. Each light came on 2 
sec. after termination of the preceding light, and 
the marble was ejected 1 sec. after termination of 
S4. The interval between 84 and the subsequent 
onset of S1 was also 2 sec. Thus the interstimulus 
and intertrial intervals were both 2 sec. 

Extinction. Extinetion was introduced with no 
interruption on Trial 62. One-third of the Ss from 
the 100% reinforcement. condition were presented 
with only S1, another third only with S4, and 
the last third with the entire sequence 81-84. A 
similar procedure was followed for 54% Ss, so that 
10 Ss were in each of the six reinforcement 
schedule-extinction conditions. The extinction cri- 
teria were more stringent than in previous experi- 
ments; S was required either to leave the room or 
to hesitate with a 60-sec. latency before the experi- 
ment was terminated. It was thus possible to em- 
ploy a number of different extinction criteria up 
to a limit of 60 sec. in order to determine their 
interrelationships. If S had not extinguished by 
116 or 294 trials, E entered the room and said, “You 
ean quit if you want to. Which would you rather 
do, quit or play some more?” If S did not want 
to quit E left the room. He did not enter it again 
after the two hundred and ninety-fourth response. 


RESULTS 


Training Amplitude 


Extinction conditions were combined in 
order to provide a more stable picture of 
training data as a function of reinforcement 
schedule. The first 30 trials were not ana- 
lyzed because of E's interaction with S on 
some of these trials. For Trials 31-61, trials 
were selected on which the preceding trial 
had been reinforced for all Ss. Trials follow- 
ing reinforcement were grouped into triads 
and the median amplitude recorded for each 
triad, separately for each of the four stimuli 
within a trial. Four such triads were ob- 
tained, covering Trials 34 to 58. The means 
of the medians of these triads are pre- 
sented in Figure 5 separately for the two 
reinforcement schedules. This figure indi- 
cates that amplitude across trials was quite 
stable, that it systematically varied within 
trials, and that it varied as a function of 
reinforcement schedule. Responses to the 
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Fic. 5. Mean response amplitude (in mm.) over the last half of training for the two 
reinforcement, schedules (100% and 54%), separately for the four responses within a trial. 
Each mean is based on the median value of blocks of three trials following reinforced 


trials. 


first three stimuli of each sequence were 
highly similar for the two conditions, con- 
sisting of a decrement from S1 to S3. Pres- 
entation of S4, however, resulted in an in- 
crement for the 100% condition while the 
54% condition resulted in a further decre- 
ment, These data were submitted to a four- 
way analysis of variance with reinforce- 
ment schedule and extinction block point as 
two between-S variables, and trial blocks 
(one to four) and stimuli (S38 and $4 only) 
as two within-S variables. Inclusion of ex- 
tinction block point as a variable, even 
though this factor was not introduced until 
extinetion, allowed evaluation of differences 
in training amplitude as à function of sub- 
ject differences produced by assignment to 
subsequent extinction conditions. Of 15 re- 
sulting F ratios, two were significant at p < 
.05. The reinforcement schedule by stimuli 
interaction F was 5.09, df = 1/54, confirm- 
ing the reliability of the differential change 
in amplitude from $3 to S4 for the two 
reinforcement conditions. One triple inter- 
action reached significance, the reinforce- 
ment schedule by trial by extinction block 
point F ratio equalling 2.28, df = 6/162, p 
< .05. The nature of this interaction, obvi- 
ously due to sampling fluctuation, is de- 


seribed in Figure 6 which presents ampli- 
tude to S3 and 84 separately for each 
subsequent extinction condition (only the 
first and last block of training trials are pre- 
sented in Figure 6). Conditions 1234 and 4 
show that 100% Ss were initially respond- 
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Fic. 6. Mean response amplitude (in millime- 
ters) for trial blocks 34-39 and 56-58 for the two 
reinforcement schedules (100% and 54%), sepa- 
rately for the last two responses within a trial (3 
and 4), and separately for extinction block point 
(1234, 4, and 1). 
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Fic. 7. Mean response amplitude (in millime- 
ters) for training Trials 5-9 and 60 for the two 
reinforcement schedules (100% and 54%), sepa- 
rately for the last two responses within a trial 
(3 and 4). 


ing with greater amplitude than 54% Ss 
but that this difference decreased, while an 
opposite trend is indicated for Condition 1. 
Of greater theoretieal importance is the 
obvious consistency of the increase in am- 
plitude from S3 to S4 for 100% Ss: it is 
apparent on all six of the comparisons in 
Figure 6, while 54% Ss manifest an increase 
on only one of these comparisons. 

If the amplitude goal gradient developed 
as a result of the 100% reinforcement sched- 
ule, one would expect no difference in its 
form between 100% and 54% conditions 
early in training, followed by its emer- 
gence in Condition 100%. Unfortunately the 
data of early training trials are contami- 
nated by E's interactions with S as previ- 
ously noted. A careful check of the record- 
ings, however, indicated that beginning 
with Trial 5, the data of 23 Ss from 100% 
and 24 Ss from 54% were free of E’s influ- 
ence; that is, E was not in the experimental 
room. Amplitude of these Ss to Stimuli $3 
and S4 for Trials 5 to 9 and 60 (the last 
training trial not preceded by the interup- 
tion of program resetting) is presented in 
Figure 7 separately for the two reinforce- 
ment schedules. It is clear that (a) there is 
no difference in the form of the gradient on 
Trials 5 and 6; (b) a differential gradient 
developed on Trials 7 and 8; and (c) it was 
maintained throughout training (as exem- 
plified by Trial 60). Combining Trials 5 and 


6, and 9 and 60, analysis of variance with 
reinforcement schedule as a between-S vari- 
able and trials as a within-S variable 
yielded a significant interaction, F = 4.66, 
df = 1/44, p < .05, indicating that the 
100% goal gradient developed over training 
trials and was absent on early trials. 

It was also possible to define amplitude 
goal gradients on an individual basis and 
then count the number of Ss manifesting 
such gradients. These analyses yielded one 
expected and one unexpected result. The 
expected result was that most of these Ss 
were from the 100% reinforcement condition 
(16 of 23 Ss, p < .05). The unexpected 
finding was that seven of these Ss mani- 
fested clear gradients which subsequently 
disappeared. It would be tenuous to attrib- 
ute this to chance: in order to be scored as 
showing a goal gradient, amplitude to S4 
had to be greater than to S3 five trials in 
succession. Assuming a probability of 1⁄2 
that such would occur on any one trial, p 
= .03 that such a pattern would occur by 
chance. Figure 8 presents the mean magni- 
tude of these seven Ss to S3 and S4 on Trials 
4 to 9, 10 to 30, and 40 to 60. Means on 
Trials 4 to 9 show slightly smaller ampli- 
tudes to 84, thus indicating that the gradi- 
ents shown on Trials 10 to 30 were devel- 
oped over trials. The gradient revealed on 
Trials 10 to 30 is about the same magni- 
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Fic. 8. Mean response amplitude (in millime- 
ters) for seven Ss manifesting a “disappearing 
gradient,” for Trials 4-9, 10-30, and 40-60, sepa- 
rately for the last two responses within a trial 
(3 and 4). 
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tude as that presented in Figure 7 for 100% 
Ss and is significant as measured by related 
t, with a value of 4.56, df = 6, p < .01, 
During the second half of training, this 
gradient entirely disappeared for most. Ss, 
as indicated by the almost complete lack 
of a difference between 83 and $4 on Trials 
40 to 60. The author is at a loss to explain 
this phenomenon. Perhaps these Ss became 
so “sure” of the marble that all attendant 
emotion habituated, leaving a flat gradient. 
In this respect it may be noted that six of 
the seven were in the 100% reinforcement 
condition. 


Training Speed 


Starting speeds (reciprocals of latency) 
were computed for the same trials as de- 
scribed previously for amplitude, and are 
presented in Figure 9. The slow speeds to 
S1 are due to the time required to place the 
marble from the previous trial on the board 
and thus are not comparable to speeds in 
the remaining segments of the trial. Speeds 
to $2, 83, and S4 reveal an increase from 
S2 to S4, with a tendency for speeds to be 
maximum to $3 late in training. There ap- 
pears to be little difference in the shape of 
the curves as a function of reinforcement 
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schedule, but there does appear to be a 
tendeney for 54% Ss to respond faster. A 
four-way analysis of variance, involving the 
same variables as previously described for 
the corresponding amplitude data (except 
that three stimuli, 82, $3, and 84, were in- 
cluded instead of just S3 and 84), yielded 
only one significant F, that corresponding 
to the increase in speed from 82 to 84, F — 
427, df = 2/108, p < .025. It may thus be 
concluded that starting speeds do not reveal 
a speed goal gradient which is influenced by 
reinforcement schedule, but rather that the 
same gradient is produced by both sched- 
ules, consisting of a small but significant 
rise from S2 to S4, with some tendency for 
a “peaking” to occur at 83 (a sampling of 
other trials did not always reveal this 
peaking). 

To determine if the slightly faster speeds 
of 54% Ss were also apparent on early train- 
ing trials, speeds on Trials 5 to 9 were caleu- 
lated, using those Ss whose data were not 
contaminated by interactions with E (ie., 
the same Ss as previously described in the 
analysis of early amplitude data). These 
data suggested slower speeds for 54% Ss, 
and hence the late trial speeds for these Ss 
also were determined and are presented in 
Figure 10 (Trials 60 and 61 have been 
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forcement schedules (100% and 54%), separately for the four responses within a trial. 
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Fia. 10. Mean response speed (100/t sec.) for early (5-9) and late (34-61) training 
trials, separately for the two reinforcement schedules (100% and 54%) and for the last 
three responses within a trial (each break represents the end of a trial). 


added to illustrate the stability of the 
trends). The pattern is apparent: on Trials 
5 and 6 there is little difference in speeds 
between the two reinforcement schedules; 
Trials 7, 8, and 9 reveal faster speeds for 
the 100% schedule; Trials 34 to 43 suggest 
a transition period, with no consistent differ- 
ence between groups; and Trials 49 to 61 
reveal faster speeds for the 54% schedule. 
The first and last five trial blocks in Figure 
10 were combined and submitted to analysis 
of variance, with reinforcement schedule 
as a between-S variable and trials and 
stimuli as two within-S variables. Since a 
number of significant F ratios emerged, a 


summary table of this analysis is presented 
in Table 5, and the data are presented in 
Figure 11, the second and third vertical 
panels combining trial blocks (Panel 2) as 
well as reinforcement schedule (Panel 3). 
The significant trials effect reflects the 
increase in speed from early to late training 
trials, while the significant stimuli effect re- 
flects a monotonic increase in speed from S2 
to S4 (combining both reinforcement sched- 
ules, as in Panel 3, there is no evidence of 
peaking at S3). The significant stimuli by 
reinforcement schedule interaction indicates 
that the form of the speed goal gradient 
differed for the two reinforcement schedules, 
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Fic. 11. Mean response speed (100/t sec.) for early (5-9) and late (49-61) training 
trials, separately for the two reinforcement schedules (100% and 54%) and for the last 
three responses within a trial (2, 3, and 4). The second vertical panel combines all trials, 
and the third panel combines reinforcement schedule as well as trials. 
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TABLE 5 


ANALYSIS OF VARIANCE OF SPEEDS For EARLY 
AND LATE TRAINING TRIALS 


Source df MS F 
Between Ss 95 
Reinforcement schedule 1 2 — 
(R) 
Error 94 292 
Within Ss 480 
Trials (T) 1. 1309  13.20** 
Stimuli (S) 2 85 4,53* 
TXS 2 37 2.45 
TXR 1 547 5.52* 
SXR 2 59 3.18* 
TXSXR 2 21 1.41 
Error 1 (T & TR) 94 99 
Error 2 (B & BR) 188 18 
Error 3 (TS & TSR) — 188 15 
* p < .05. 
**p < 01. 


with a peaking at $3 for 54% and maximum 
speeds at S4 for 100% (Panel 2). Finally, 
the significant trials by reinforcement 
schedule interaction reflects the differential 
increase in speeds over training trials for 
the two reinforcement schedules, with 54% 
Ss responding slower on Trials 5-9 and 
faster on Trials 49-61. Related ¢ tests of 
these within-group changes revealed that 
the change for 54% Ss was significant (p < 
.001) while the change for 100% Ss was not 
significant. 

Speed data, in summary, show (a) a 
gradual increase over training trials, (b) a 
goal gradient for both reinforcement sched- 
ules, (c) some hint of a differential gradi- 
ent, with the 54% schedule producing peak- 
ing at 83 and the 100% schedule at 84 (this 
difference achieving significance when N = 
47 but not when N = 60), and (d) 100% 
Ss responding faster on early trials but 54% 
Ss responding faster on later trials. 


Visual Goal Orientations during Training 


Eleven Ss consistently turned their heads 
away from the stimulus window and to- 


_wards the marble hose just prior to marble 


ejection. A notation was made on S's rec- 
ord sheet of this behavior. This visual 
goal orientation occurred either simultane- 
ous with or just after responding to S4, 
but before marble ejection a second later. 


Ten of these 11 Ss were exposed to the 100% 
reinforcement schedule, yielding a corrected 
chi-square of 7.12, df = 1, p < .01. Eight 
of these Ss were among the 23 manifesting 
individual amplitude goal gradients as pre- 
viously defined, resulting in a significant 
relationship, corrected chi-square of 6.96, 
df =1,p < .01. Thus it appears that Ss who 
manifested reward anticipations with visual 
orienting responses also did so with response 
amplitude and were from the 100% rein- 
forcement condition. 


Extinction 


Extinction block point (Conditions 1234, 
4, and 1) constitutes a second independent 
variable, Amplitude and speed are discussed 
in terms of early extinction and late extinc- 
tion and are followed by a discussion of 
resistance to extinction. Early extinction 
refers to those trials during which all Ss 
responded; that is, it includes all extinction 
trials up to the point where the first S in 
that condition stopped responding. Late ex- 
tinction refers to fifths of extinction and 
thus samples the entire extinction sequence 
of all Ss. 

Early extinction amplitude. Figures 12, 
13, and 14 present mean response amplitude 
for the three extinction conditions, sepa- 
rately for the two reinforcement schedules. 
For each condition, extinction trials are 
extended to that point where the first S 
stopped responding (12 responses in Condi- 
tion 1234, 20 in Condition 4, and 20 in 
Condition 1). In all cases, performance on 
the last training trial is presented so that 
changes from training to extinction may be 
noted. Figure 12 indicates that Condition 
1234 resulted in increased amplitude for 
100% Ss and in decreased amplitude for 
54% Ss. A three-way analysis of variance, 
with stimuli and trials as two within-S vari- 
ables and reinforcement schedule as a be- 
tween-S variable, revealed one significant F, 
that for the reinforcement schedule by trials 
interaction, F = 7.91, df = 3/54, p < 001. 
The increase from Training Trial 61 to 
Extinction Trial 3 was significant for 100% 
Ss, t = 44, df = 54, p < .001 (utilizing 
the error term from analysis of variance), 
while the decrement for the 54% Ss was not 
significant. 
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Fic, 12. Mean response amplitude (in millime- 
ters) for the last training trial (61) and the first 
three extinction trials for Extinction Block Point 
1234, separately for the two reinforcement sched. 
ules (100% and 54%) and for the four responses 
within a trial (each break represents the end of a 
trial). 


Figures 13 and 14 do not reveal similar 
inerements in amplitude for 100% Ss. In 
fact, none of the differences between or 
within groups were significant except for 
the decrement over extinction trials in Con- 
dition 4 for both reinforcement schedules. 
There is a suggestion that 100% Ss in Con- 
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Fic. 13. Mean response amplitude (in millime- 
ters) for the last training trial (61) and for the 
first 20 extinction responses for Extinction Block 
Point 4, separately for the two reinforcement 
schedules (100% and 54%). 
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Fic. 14. Mean response amplitude (in millime- 
ters) for the last training trial (61) and for the 
first 20 extinction responses for Extinction Block 
Point 1, separately for the two reinforcement 
schedules (100% and 54%). 


dition 4 manifested an increment in ampli- 
tude between Extinction Trials 14 and 20, 
but statistical evaluation did not verify 
this observed trend. The early extinction 
amplitude data could thus be summarized 
by stating that all groups except 100% Ss 
in Condition 1234 showed either no change 
or a slight decrement in amplitude, while 
the latter Ss manifested a temporary in- 
crease. 

Early extinction. speed. Figures 15, 16, 
and 17 present speed data for early extine- 
tion trials. Speed to S1 is omitted in Condi- 
tion 1234, since receipt of the marble on 
Trials 60 and 61 affected speeds on Trials 
61 and the first extinction trial, contributing 
unwanted variance to the analyses. These 
figures indicate that speed of 54% Ss re- 
mained fairly constant for all three extinc- 
tion conditions, but that 100% Ss in Condi- 
tions 1234 and 4 showed a marked drop. 
Statistical analysis confirmed these trends, 
indicating that the decrement for the latter 
two conditions was significant (p « .05). 

Speed variability was also investigated. 
Two predietions from frustration theory 
Were tested. First, 100% reinforcement 
should have produced more approach-avoid- 
ance conflict during early extinction than 
54% reinforcement, and this difference 
should have been maximum in Condition 
1234, since the similarity of training and 
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Fic, 15, Mean response speed (100/t sec.) for 
the last training trial (N) and the first three ex- 
tinction trials for Extinction Block Point 1234, 
separately for the two reinforcement schedules 
(100% and 54%) and for the last three responses 
within a trial (2, 3, and 4). 


extinction in this condition would have pro- 
duced maximum frustration. Second, con- 
sidering extinction block point, Condition 
1234 should have produced maximum con- 
flict and 1 the least, and this difference 
should have been maximum in the 100% 
reinforcement conditions, where frustration 
was presumably maximum. 

Both predictions were tested by obtaining 
variances of changes in speed from training 
to extinction, thus controlling between-S 
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Fic. 17. Mean response speed (100/t sec.) for the last training trial (N) and the first 
20 extinction enin for Extinction Block Point 1, separately for the two reinforcement 
schedules (100% and 54%). 
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Fic. 16. Mean response speed (100/t sec.) for 
the last training trial (V) and the first 20 extinc- 
tion responses for Extinction Block Point 4, sepa- 
rately for the two reinforcement schedules (10095 
and 54%). 


differences due to training differences, All 
Ss in the 1234 condition, 10095 reinforce- 
ment schedule (i.e., 1234, 100) were repre- 
sented for only the first 12 extinction re- 
sponses (three trials), since the first S to 
extinguish did so at this point. Therefore 
each S's speed to 82, S3, and S4 on Extinc- 
tion Trial 3 was summed and subtracted 
from the corresponding sum on the first 
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extinction trial (S1 speeds were omitted 
due to depressed speeds on Extinction Trial 
1 which resulted from handling of the 
marble on the last training trial). The vari- 
ances of these difference scores for Condi- 
tions 1234, 100; 1234, 54; and 1, 100 were 
23,013, 5,944, and 9,078, respectively. The 
ratio of the first to the second tests the 
first prediction, and yields an F of 3.87, 
df = 9/9, p < .05. The ratio of the first to 
the third tests the second prediction, and 
yields an F of 2.54, df = 9/9, p < .10. It 
may thus be concluded that both predictions 
are supported, the first more clearly than 
the second. 

Late extinction. Figures 18 and 19 pre- 
sent amplitude and speed data, respectively, 
for the end of training and fifths of ex- 
tinction. Both figures were evaluated by a 
three-way analysis of variance, with rein- 
forcement schedule amd extinction block 
point as two between-S variables and trials 
as a within-S variable. No significant Fs 
were obtained for the amplitude data, al- 
though the increase in amplitude for 100% 
Ss in Condition 1234 is obvious. Two signifi- 
cant Fs emerged from the analysis of speed 
data, with the decrement over trials being 
significant (F = 6.30, df = 5/270, p < 
001), and the interaction of trials and ex- 


tinetion block point approximating signifi- 
cance at p = .10 (F = 1.57, df = 10/270). 
Although of questionable statistical signifi- 
cance, this latter F is clearly suggested by 
the speed decrements for Conditions 1234 
and 4 and the relative lack of a decrement 
for Condition 1, These trends are highly 
similar to speed data for early extinction 
(see Figures 15-17). 


Resistance to Extinction 


Table 6 presents mean number of re- 
sponses to extinction for four different ex- 
tinction criteria: a response latency of 10, 
20, 30, or 60 sec. (the average training la- 
teney was about 1 sec.). Six Ss who re- 
sponded 750 times and were stopped by E 
were assigned values of 750 if they did not 
display a latency greater than any one of 
the four criteria under consideration. For 
example, if one of these Ss hesitated 12 sec. 
on the eighty-ninth extinetion response, 
but did not pause 20 sec. or longer prior to 
being stopped at n = 750, a value of 89 
was assigned for the 10-sec. criterion and a 
value of 750 for the remaining three. 

The pattern in Table 6 is remarkably 
consistent for all four criteria: partial rein- 
foreement produced greater resistance to 
extinction, and a block point “far” from the 
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Fic. 18. Mean response amplitude (in millimeters) for the last training trial (N) and 
fifths of extinction (n), separately for reinforcement schedule (100% and 54%) and ex- 


tinction block point (1234, 4, and 1). 
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Fic. 19. Mean response speed (100/t sec.) for the last training trial (N) and fifths of 
extinction (n), separately for reinforcement schedule (100% and 54%) and extinction 


block point (1234, 4, and 1). 


goal (Condition 1) produced greater resist- 
ance to extinction than a block point “near” 
to the goal (Conditions 1234 and 4). Analy- 
sis of variance indicated the partial rein- 
forcement effect (PRE) was highly signifi- 
cant (p < .01), while extinction block point 
was of borderline significance, usually yield- 
ing an F significant at about p = .10, de- 
pending upon the criterion measure. As 
further evidence of its reality, it may be 
noted that of the six Ss who responded 750 
times, five were in Condition 1. The exact 
probability of any 0, 1, 5, or more extreme 
split for 6 objects divided into three groups 
is only .053. 

It was previously noted that frequency of 
amplitude goal gradients and visual goal 
orientations were both significantly associ- 


TABLE 6 
MxaN NuwBER or Responses TO EXTINCTION 
TO Four CRrTERIA 


Extinction block point 


ent ey Imm " 1 
100% 10 61 72 201 
20 88 76 220 
30 98 81 236 
60 122 8l 237 
54% 10 150 162 278 
20 209 230 300 
30 216 246 317 
60 224 250 318 


ated with the 100% reinforcement schedule. 
Since 100% Ss also extinguished faster, one 
would expect faster extinction for Ss mani- 
festing either goal gradients or visual orien- 
tation responses. These expectations were 
confirmed; of the 23 Ss manifesting ampli- 
tude goal gradients, 16 responded fewer 
than the overall median number of re- 
sponses to extinction (124 for the 60-sec. 
criterion), yielding a corrected chi-square of 
451, df = 1, p « .05. Nine of the 11 Ss 
manifesting visual goal orientations also ex- 
tinguished with fewer than 124 responses, 
corrected chi-square of 4.01, df = 1, p < .05. 
When schedule of reinforcement was held 
constant these relationships were not sig- 
nificant, suggesting that the preceding re- 
lationships were due to confounding with 
reinforcement schedule. The question is a 
moot one, however, since the small Ns in- 
volved in the more refined evaluations re- 
sulted in relative insensitive statistical tests. 
The most that can be safely concluded is 
that goal gradients and orientation re- 
sponses were associated with rapid extine- 
tion. 


Discussion 


Number of responses to extinction indi- 
cates that secondary reinforcement effects 
were not demonstrated; indeed, the opposite 
was found, with Conditions 1234 and 4 pro- 
ducing faster extinction than Condition 1. 
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These results are consistent with the find- 
ings of Experiments 1 and 2, all three stud- 
ies showing that presentation of cues previ- 
ously paired closely with reinforcement 
(e.g, S--n and S+ in Experiments 1 and 
2, S4 in Experiment 3) resulted in faster 
extinction than presentation of cues more 
removed from reinforcement (e.g, S— in 
Experiments 1 and 2, S1 in Experiment 3). 
As previously noted, such results are con- 
sistent with Amsel’s frustration theory. 

Experiment 3 provides a number of inde- 
pendent tests of Amsel’s frustration theory, 
providing checks on the internal consist- 
ency of the data. For example, the training 
speed data provide clear evidence of slower 
initial speeds for the 54% reinforcement 
schedule, but greater final speeds, thus con- 
firming similar findings with animals, Ac- 
cording to frustration theory, partially rein- 
forced Ss are frustrated by the occurrence of 
nonreinforced training trials, and approach- 
avoidance conflict results, producing ini- 
tially slower speeds. With more trials, S 
learns to continue responding in the pres- 
ence of frustration cues, since such re- 
sponses are often reinforced, and other re- 
sponses are not. The avoidance tendencies 
thus become relatively weak. At the same 
time, frustration-produced motivation adds 
to the general drive level of S, producing a 
greater total motivational level than for 
100% Ss. It is the interaction of this higher 
motivational level with the relatively strong 
instrumental habit which produces the final 
greater speed for partially reinforced Ss. 
The obtained training speeds are consistent 
with such an interpretation. 

Extinction amplitudes and speeds also 
provide patterns of relationships which are 
generally consistent with frustration theory. 
Condition 1234, 100, for example, would be 
expected to produce greatest frustration 
effects early in extinction, since not only 
Were reward expectations stronger for 100% 
Ss (recall the amplitude goal gradients), 
but Extinction Condition 1934 was maxi- 
mally similar to training conditions. The 
FE was clearly demonstrated in this condi- 
tion, consisting of an increment in ampli- 
tude and a decrement in speed. Precisely 
the same results were found in Experiment 
2 for Condition S+. Condition 4, 100 would 


be expected to produce fairly strong frus- 
tration effects, too, although the absence of 
previous stimuli S1-S3 would be expected 
to work in the opposite direction to some 
unknown extent. This condition did, in fact, 
provide most evidence for the FE after 
1234, 100, consisting of a nonsignficant in- 
crement in amplitude on Extinction Trials 
14-20, and a significant decrement in speed 
during early extinction trials. It is the pre- 
diction of these response patterns between 
amplitude, speed, and number of responses 
to extinction which provides impressive 
support for frustration theory. 

The relationships between another set of 
response variables are also of interest; spe- 
cifically, between amplitude goal gradients, 
visual orienting responses, and resistance to 
extinction. A basic presupposition of Am- 
sel’s theory is that emotional reactions to 
frustration are a positive function of antici- 
patory goal responses (rg’s) occurring at the 
time of frustration. To the author’s knowl- 
edge, all evidence to date offered in support 
of this assumption is indirect in the sense 
that rg's have not been directly observed, 
but have been inferred from stimulus con- 
ditions (e.g., Amsel & Hancock, 1957, with 
animals; Longstreth 1960, 1962, with hu- 
mans). In the present study the existence 
of visual orienting responses may be taken 
as direct instances of such anticipatory goal 
responses, since that is literally what they 
were: such orientations were first elicited 
by presentation of the marble itself, later 
were elicited by preceding stimuli (84), and 
thus became anticipatory. 

The isolation of 11 Ss who consistently 
made these visual rg’s provided test condi- 
tions for two theoretical notions. First, 
Spence has theorized that rg’s possess drive 
properties; they increase the organism’s 
level of motivation (Spence, 1956). Since 
these rg’s occurred shortly before or simul- 
taneous with the joystick response to $4, 
increased drive might have been expected to 
manifest itself in the amplitude of this 
response. In other words, more of these 11 
Ss should have manifested amplitude goal 
gradients than expected on the basis of 
the null hypothesis. It will be recalled that 
eight, in fact, did manifest a gradient, re- 
sulting in a significant relationship with 
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frequency of goal gradients. Spence's theo- 
rizing, when extended to human Ss, was sup- 
ported. 

Second, Amsel's assumption of a posi- 
tive relationship between rg's and frustra- 
tion-produced aversive motivation led to 
the prediction that these 11 Ss should have 
extinguished more rapidly than remaining 
Ss. It will be recalled that nine of them did 
so, providing statistically significant sup- 
port for Amsel's assumption. If, as Spence's 
theorizing suggests, amplitude goal gradi- 
ents are a reflection of anticipatory goal re- 
sponses, a second, independent test of Am- 
sel’s assumption was possible: the 23 Ss 
exhibiting such gradients could be expected 
to also extinguish rapidly. Sixteen responded 
less than the median number of responses 
to extinction, supporting this predietion (p 
< .05). 


ConcLUDING CONSIDERATIONS 


Since the reported experiments were in- 
terpreted as strongly supporting the appli- 
cation of frustration theory to human be- 
havior, and since the theory was discussed 
in detail, the final section is mainly con- 
cerned with the current status of secondary 
reinforcement. 

There are at least three criteria which 
must be met when attempting to provide an 
unambiguous demonstration of Sr effects. 
One is implicit in the very definition of sec- 
ondary reinforcement, a second has been 
discussed in a number of different contexts, 
and a third is suggested here. After briefly 
discussing each of these three, it will be 
argued that no so-called demonstration of 
secondary reinforcement with human Ss 
satisfies all three criteria. For purposes of 
communication, the three precautions are 
called the reinforcement precaution, the 
elicitation precaution, and the intensifica- 
tion precaution. 

. The reinforcement precaution is implied 
in the customary definition of an Sr, to the 
effect that it must be paired with a rein- 
forcer, The problem is this: a reinforcer 
may strengthen one response class, but not 
another—it may not be transsituational. A 
puff of air, for example, strengthens the 
conditioned eyeblink, but not necessarily 
the patellar reflex. A demonstration of sec- 


ondary reinforcement therefore requires 
that evidence be presented showing that 
the original reinforcer does in fact 
strengthen the response which is to be fol- 
lowed by only Sr in the test phase. Failure 
to provide such evidence creates a double- 
edged sword: if Sr effects are not demon- 
strated, it can be argued that no evidence 
was presented demonstrating that Sr was 
paired with a reinforcer appropriate for 
that response in the first place, and hence 
the test was inadequate; if Sr-like effects 
are demonstrated, it can be argued that 
pairing Sr with a reinforcer was not neces- 
sarily the critical variable, since no such 
reinforcer was demonstrated, and hence 
other interpretations become reasonable. 

The elicitation precaution, elsewhere 
identified with Bitterman’s discrimination 
hypothesis (Myers, 1958), involves one 
such interpretation. According to this re- 
quirement, it must be insured that the Sr 
does not elicit the response it is assumed to 
strengthen. It is for this reason that Skinner 
box studies of Sr effects have been consid- 
ered to be of questionable value: the click 
of the food magazine, previously paired with 
food pellets and then presented alone fol- 
lowing a bar press, may elicit the next bar 
press rather than strengthen the last one 
(e.g, Bulgeski, 1956). A highly related 
problem is that the click may elicit. ap- 
proach responses to the food cup, which then 
keep S in the vicinity of the response bar, 
thus increasing the probability that a press 
will occur (e.g, Wyckoff et al., 1958). In 
this case it may be said to indirectly elicit 
the response it presumably strengthens. In 
either case, secondary reinforcement proc- 
esses are clearly not required: a simple 
cueing effect accounts for the data. 

The intensification problem is suggested 
by discovery of the frustration effect: that 
removal of a reinforcer from its customary 
spatio-temporal location strengthens im- 
mediately subsequent behavior. The crucial 
consideration is that precisely the same be- 
havior can be strengthened by either sub- 
sequent presentation or preceding absence 
of reinforcement. Thus Amsel and others 
have shown in a number of double-alley 
studies that alley running initially learned 
by reinforcement in the goal boxes is further 
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strengthened in Alley Two by the omission 
of reinforcement in Goal Box One. Experi- 
ments 2 and 3 in the present monograph 
have demonstrated a similar phenomenon 
with human Ss, as have other studies 
(Haner & Brown, 1955; Holton, 1961; 
Longstreth, 1965; Ryan, 1965). As most 
Sr studies involve the removal of a rein- 
forcer, but presentation of most, if not all, 
other eues previously paired with reinforce- 
ment, conditions for the FE are maximal. 
Strengthening of such behavior is obviously 
due to preceding cues, and to that extent 
cannot possibly be due to Sr effects, which 
are exerted after the response. 

Turning, then, to Sr studies with human 
Ss, it may first be noted that a shift in the 
ratio of positive to negative reports oc- 
curred around 1960. Reviewing the litera- 
ture to 1958, Myers concluded, “The author 
feels that secondary reinforcement is inade- 
quately defined and inadequately demon- 
strated.... [Myers, 1958, p. 299].” Since 
then, Myers and his associates have re- 
ported a series of studies which seem to tip 
the scales in the positive direction: the 
author is aware of nine published studies 
by this group, seven of which report Sr 
effects (Fort, 1961, 1965; Leiman, Myers, 
& Myers, 1961; Myers, 1960; Myers, Craig, 
& Myers, 1961; Myers & Myers, 1962, 1963, 
1964, 1965). It may be noted in passing, 
however, that other recent studies are not 
so confirmatory. Ignoring the negative re- 
sults in the present three studies, two others 
report negative results (Kass, Wilson, & 
Sidowski, 1964; Longstreth, 1962), while 
one reports positive results (Sidowski, Kass, 
& Wilson, 1965). Further, the author is 
aware of three unpublished studies, all of 
which report negative results (Donaldson, 
1961; Estes, 1960; Hall, 1964). 

It would seem, then, that the Myers stud- 
ies afford the best opportunity to learn what 
unique procedures produce consistent Sr 
effects. Examination of the first one in the 
series will prove as instructive as any, since 
it is concluded, “The results clearly indicate 
that tokens can be established as strong sec- 
ondary reinforcers for preschool children 
[Myers, 1960, p. 177]." Preschool children 
were exposed to the following sequence: 


light—button press—receipt of token—in- 
sertion of token—button press—candy. Af- 
ter 20 "conditioning" trials of this sequence, 
Ss were extinguished under one of the two 
following sequences: (a) light—button 
press—button  press—ete. (ie. neither 
token nor candy was delivered), (b) light 
— button press—receipt of token—insertion 
of token—button press—ete. (i.e., the token 
was delivered, but not candy). It was found 
that Condition (b) resulted in significantly 
more responses during the 34% min. of “ex- 
tinction,” leading to the quoted conclusion. 

Now, what happened in this situation? 
To begin with, no acquisition data are pre- 
sented to document the claim that “condi- 
tioning” occurred. Thus, there is no evi- 
dence that candy was a reinforcer for the 
button-press response. Indeed, a control 
group did not receive candy, and yet ap- 
parently performed the same as other 
groups, since it finished the 20 training trials 
in the same length of time. The study ac- 
tually presents evidence, then, that the 
candy played no role in acquisition of the 
response which was subsequently used to 
test for Sr effects, and therefore, by defini- 
tion, the candy was not a reinforcer. Thus 
the reinforcement precaution was clearly 
not satisfied in this study. 

Next, it may be noted that during train- 
ing, receipt of the token and its insertion 
regularly preceded button pressing, and thus 
presumably became functional as cues elic- 
iting the button-press response. Presenta- 
tion of these cues during extinction, as in 
Condition (b), would therefore be expected 
to result in more button-press responses 
merely on the basis of this previous S-R 
association. Clearly, then, the tokens may 
have elicited the next response rather than 
strengthened the preceding response, par- 
tieularly in view of the previous point that 
the candy was apparently not a reinforeer 
to begin with. The authors seem to have 
become aware of this possibility a year 
later, since in a very similar study it is 
stated, “The Sr, when administered in ex- 
tinetion, seemed to release the additional 
behavior in the chain... [Myers et al, 
1961, p. 771]." If it "released" the next 
response, there is no need to assume it 
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strengthened the preceding response, and 
thus it may be concluded that the elicitation 
precaution was not satisfied. 

Finally, it may be noted that failure to 
obtain eandy in Condition (b) was presum- 
ably more frustrating than failure to obtain 
candy in Condition (a), since (b) contained 
more of the cues previously associated with 
candy and hence presumably elicited 
stronger anticipations of candy. Since ex- 
tinction involved a free operant procedure 
rather than a discrete trial procedure, S 
was able to respond shortly after frustration 
occurred, thereby maximizing the proba- 
bility that perseverative frustration effects 
were operative during subsequent responses. 
If one of these effects was an energizing 
one, as found in the present studies, then 
the higher rate of responding under Condi- 
tion (b) may have been a frustration effect 
rather than an Sr effect. It is unfortunate 
that extinction did not extend beyond 3% 
min., since the high response rate may have 
been predictive of rapid extinction, just as 
the present studies found high response 
amplitude to be associated with rapid ex- 
tinction. It must be concluded, then, that 
the intensification precaution was also not 
satisfied in this study. 

An examination of every available pub- 
lished study reporting Sr effects with human 
Ss reveals that at least one of these precau- 
tions was not met, although it is unusual to 
find a study which violated all three. It is 
on this basis that it becomes reasonable to 
argue that the phenomenon has not yet 
been clearly demonstrated with human Ss. 
Space does not permit a similar evaluation 
of the infrahuman literature. A careful look 
at these studies may well lead to a similar 
conclusion. Such a possibility is undoubt- 
edly what Amsel had in mind when he 
wrote, 


It occurred to me...that there is a similarity of 
operations... raising the possibility that at least 
some of the effects which we have been attributing 
to secondary reinforcement may actually depend 
upon the arousal of frustration and its reduction 
[Amsel, 1961, p. 35]. 


SUMMARY 


The conditions which allegedly result in 
the development of a secondary reinforcer 


(Sr) are highly similar to those which re- 
sult in frustration: pair a neutral stimulus 
with reinforcement a number of times and 
then present it alone. According to the no- 
tion of secondary reinforcement, such a cue 
will acquire the function of reinforcement. 
Its presentation following a response should 
therefore strengthen that response in the 
same manner that a “primary” reinforcer 
does. Such strengthening is called the sec- 
ondary reinforcement effect (Sr effect). 

According to Amsel's frustration theory, 
such a cue is not reinforcing, but on the 
contrary, elicits an unconditioned aversive 
emotional response Rr. The effect of Ry 
upon the modification of preceding or sub- 
sequent behavior is described by Amsel's 
theory. In certain situations such effects are 
easily distinguished from Sr effects; in other 
situations the distinction is more difficult, 
leading to confusions about the roles of 
secondary reinforcers and Ry. Indeed, it is 
even possible that one of the two concepts 
is sufficient to account for the data often 
ascribed to the other. This monograph re- 
ports a series of studies designed to investi- 
gate the roles of these concepts in human 
instrumental conditioning. 

Experiment 1 paired two stimuli (S+ and 
n) with reinforcement and another stimulus 
(S—) with nonreinforcement. One-third the 
Ss were subsequently presented with both 
Sr’s but no reinforcement following a joy- 
stick response (Condition S+n), another 
third with one of these stimuli (Condition 
S+), and another third with 8—. Condition 
S--n resulted in fastest extinction, the 
greatest decrement in speed following the 
very first extinction trial, and greatest speed 
variability early in extinction, These re- 
sults did not support secondary reinforce- 
ment theory, but were consistent with frus- 
tration theory except that (a) differences 
between Conditions S+ and S— were gen- 
erally not significant and (b) response am- 
plitude changes predieted by frustration 
theory were not observed. Experiment 2 
replicated conditions S+ and S— and intro- 
duced some procedural changes designed to 
further investigate the relative applicability 
of frustration versus secondary reinforce- 
ment concepts. The results were completely 
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in accord with frustration theory, the S+ 
condition resulting in faster extinction, a 
temporary decrement in speed, greater vari- 
ability in speed, and a temporary increment 
in response amplitude. 

Experiment 3 approached the problem 
from a different viewpoint, introducing two 
different variables, schedule of reinforce- 
ment (100% and 54%) and “nearness to 
goal” at the time of extinction. A number 
of relationships emerged from this study, 
which may be summarized as follows: (a) 
the 100% reinforcement schedule resulted 
in the appearance of amplitude and speed 
goal gradients, defined as increases in am- 
plitude and speed as the goal was ap- 
proached; (b) the 100% condition also 
produced more anticipatory orienting re- 
sponses toward the goal locus than the 54% 
condition; (c) the 54% condition resulted 
in slower speeds early in training but in 
faster final speeds; (d) the PRE was ob- 


served; (e) faster extinction was obtained 
for Ss extinguished "near" to the goal than 
for Ss extinguished “far” from the goal; (f) 
conditions producing the fastest extinction 
also produced temporary decrements in 
speed, increments in amplitude, and greater 
variability in speed; and (g) Ss manifest- 
ing goal gradients and anticipatory orient- 
ing responses during training extinguished 
faster than other Ss. These results were re- 
markably consistent with predictions from 
frustration theory. Sr effects were again not 
demonstrated. 

Final considerations were concerned with 
an evaluation of evidence from other ex- 
periments which had been interpreted as 
suggesting Sr effects. Three precautions 
were discussed which seemed important in 
the evaluation of these studies. Viewed in 
this light, it was concluded that Sr effects 
have not yet been clearly demonstrated 
with human Ss. 
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SHORT-TERM MEMORY IN THE MENTALLY RETARDED: 
AN APPLICATION OF THE DICHOTIC LISTENING TECHNIQUE: 


ALDRED H. NEUFELDT 
Psychiatrie Services Branch, Department of Public Health, Regina 


A series of experiments was conducted to investigate short-term memory 
(STM) in mental retardates, utilizing the dichotic listening technique 
initiated by Broadbent (1958). The primary purpose of these experiments 
was to discern whether or not STM capacity and/or strategy of encoding 
information could account for some of the differences between retardates 
and normals. Four groups of Ss were compared: two groups of retard- 
ates, one organic (O) and one cultural familal in nature (F), a normal 
mental age control group (NMA), and a chronological age control group 
(NCA). In sum, the evidence indicated that STM capacity was indeed an 
important difference between retardates and group NCA, though the ca- 
pacity of Groups O, F, and NMA was essentially the same. Furthermore, 
both normal groups demonstrated a marked degree of flexibility in their 
adaptation of different strategies of recall to various rates of informational 
input, and ability in using more ambiguous strategies. Such flexibility was 
not found in the retardates. Differences between the two normal control 
groups, on the other hand, were indicative of the degree to which both 
memorie capacity and ability to make use of useful strategies develops in 
normal individuals over time. The implications of these results were dis- 


cussed. 


()* of the best ways to learn about a 
system of which little is known is to 
study that system at its points of break- 
down. Applying this principle to the topic 
at hand it soon becomes apparent that one 
of the most promising loci of investigating 
short-term memory (STM)? would be the 
mentally retarded. The question as to why 
the mentally retarded are retarded can be 
looked at from two points of view—either 
theirs is a problem of information retrieval, 
or one of information acquisition. If one 
holds that it is one of retrieval, this sug- 
gests that the retarded can encode? infor- 


"The research presented here was carried out in 
partial fulfillment of the PhD degree at the Uni- 
versity of Hawaii, Honolulu. The author expresses 
his appreciation in particular to Ronald C. Johnson 
for his mental stimulation and evaluative criticism. 

‘Short-term memory can roughly be distin- 
guished from long-term memory as that memory 
lasting but a few seconds or minutes as compared to 
days and weeks, A common example of short-term 
memory in action is the retention of a telephone 
number. We remember it from the time we look 
it up until it has been dialed, but seldom longer 
Provided, of course, we do not have to focus our 
attention on something else in between. 

The term “encoding” is used here to refer to the 


mation as well as normals but are not able 
to evoke that information again—essen- 
tially an untestable hypothesis. The second 
proposal, that the problem is one of acqui- 
sition, suggests that the retarded are not 
able to encode as much as normals, or at 
least are not able to retain such information 
long enough for it to be permanently stored 
—a problem of STM and hence potentially 
testable. The experiments presented, then, 
deal with the mentally retarded and have 
been designed to elucidate some of the con- 
cepts of immediate memory. 

Memory and learning. Though it has gen- 
erally been conceded that memory per se 
forms an integral part of any learning situ- 
ation, psychologists on the whole have tra- 
ditionally espoused relatively little interest 
in the former as compared to the latter. 
This unilateral emphasis has slacked off con- 
siderably since the mid-50’s, however. The 
cogency with which G. A. Miller (1956) and 


taking in of information by the organism. Osgood 
(1957) would refer to such a process as “decoding.” 
In view of the literary definition of the prefix “en,” 
however, it was felt that the former had the more 
proper connotation. This usage of “encoding” agrees 
closely with that of James Deese (1958, p. 241). 
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D. E. Broadbent (1958) have presented the 
case for STM now has even the traditionally 
cautious functionalist considering this facet 
of the study of memory and learning. The 
importance which this work has assumed has 
been noted by Melton (1963), and something 
of its nature can be seen in reviews by Pos- 
ner (1963) and Postman (1964). 

The rise of psychological information the- 
ory (ef. Berlyne, 1957) presented a major 
development with which to explore behavior. 
Viewing the organism in terms of informa- 
tion (stimulus) flow led to consideration of 
the organism in terms of memorie capacity. 
In this fashion STM has come to be taken as 
of major importance in human information- 
processing. Information, whatever the 
source, upon entering the organism (in the 
ease of exteroceptor stimulation) presum- 
ably enters an STM system. Such informa- 
tion may either be lost here due to spon- 
taneous decay or to interference from other 
incoming stimuli, or both (which, or both 
is a theoretical issue still very much alive), 
or it may be transferred to some permanent 
storage locus (long-term memory). The im- 
port of STM to any consideration of learning 
thus immediately becomes apparent in that 
an STM system can control what and how 
much information the organism encodes. 

Short-term memory. Most theorists, such 
as Osgood (1953, 1957), Broadbent (1958), 
and Miller, Galanter, and Pribram (1960), 
who consider learning and performance in 
terms of STM would picture a learning situ- 
ation something as follows: the organism is 
faced with a task which is to be learned. 
During the learning process, the organism 
is bombarded with large amounts of infor- 
mation in a short period of time. Now, pre- 
sumably, the organism which is able to store 
the most relevant information long enough 
for it to be transferred to the long-term 
memory system learns most. The learning 
problem thus becomes one of immediate 
storage capacity. Two factors which might 
affect an organism’s effective storage capac- 
ity become apparent: (a) organisms may 
differ in inherent short-term storage capac- 
ity and (b) organisms may differ in strategy 
of encoding the available information, some 
strategies being more optimal than others 
(ef. Bruner, 1957; Neufeldt, 1963). 


Particularly germane to the question of 
capacity and strategy is the problem of in- 
dividual and group differences as these 
should be readily amenable to interpretation 
in terms of STM. Consider differences be- 
tween fast and slow learners, between nor- 
mals and mentally retarded, or merely the 
effect of increasing age on learning perform- 
ance in normal individuals. These are all 
problems potentially interpretable in terms 
of STM, yet traditional measures, such as 
the digit span, have revealed few differences 
between such overtly distinguishable groups. 
The immediate storage capacity of these 
various groups appears to be very much the 
same, a point rather well made by Miller 
(1956). The major point espoused by Miller 
is not, though, that the memory spans of dif- 
ferent individuals differ little but that bet- 
ter or poorer use can be made of the span one 
has. A relatively efficient strategy of en- 
coding or recoding information, for in- 
stance, can increase the apparent memory 
span, storing more information than a rela- 
tively inefficient or poor strategy can. Con- 
siderable evidence in support of the impor- 
tance of encoding strategies is available (cf. 
Neufeldt, 1963) . 

Besides the importance of strategies on 
increasing the apparent capacity of STM, 
however, it is also possible that inherent 
differences of capacity do exist, though not 
clearly delimited by the traditional tests of 
memory span. Consider the effect of increas- 
ing age on learning performance in normal 
people as a case in point. Results of various 
experiments would appear to be in conflict, 
some showing very little falling off with in- 
creasing age, others a considerable amount 
(cf. Welford, 1958, pp. 247 ff.). Part of such 
disagreement is in all probability due to à 
lack of consistency in methods of admin- 
istration from test to test, or from time to 
time on a given test for that matter. In ad- 
dition, however, the techniques involved 
often are obtaining only a gross estimate 
of STM capacity, thus allowing for greater 
variance of results. A tool which appears to 
overcome some of these difficulties has been 
presented by Broadbent (1954). This modi- 
fied memory-span technique, termed “di- 
chotic listening,” has proven of considerable 
potential utility in the analysis of such ca- 
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pacity. For example, after equating elderly 
patients with and without gross memory 
disorder on digit span, Inglis and Sander- 
son (1961), using this technique, neverthe- 
less found that differences in capacity still 
did exist, as is to be expected. 

It would thus appear that some of the 
traditional measuring techniques (such as 
digit span) may, for the reasons specified 
above, not be either as useful or as accu- 
rate as the dichotic-listening technique de- 
scribed. Such would seem to be the case in 
the study of STM in the aged at least (cf. 
Caird & Inglis, 1961; Inglis, 1957; Inglis & 
Sanderson, 1961). The problem of whether 
or not these results are generalizable to other 
groups who differ in learning ability, such as 
fast versus slow learners, or normals versus 
mentally retarded, remains to be tested. As a 
measure of short-term storage capacity, the 
dichotic listening technique would seem to 
be a highly sensitive method of getting at 
such differences. Two questions of some im- 
portance arise: First, can this dichotic task 
also be used as an index of encoding strat- 
egy? If it can tap strategy as well as capac- 
ity, then we have, indeed, a useful device. 
Second, can this technique tell us anything 
about the structure of STM? To answer 
questions of this nature, we should consider 
the theorizing of Broadbent in some detail. 

The Broadbent model. Broadbent’s Per- 
ception and Communication (1958) has 
played a key role in the rapid development 
of interest in STM. Much of the experi- 
mental data marshaled by Broadbent in sup- 
port of his approach has been derived from 
the dichotic listening technique already 
mentioned. In a typical dichotic listening 
experiment a subject (S) listens to two se- 
quences of digits presented in such a way 
that one number arrives at the left ear at 
the same time that a different number ar- 
rives at the right; for example, the left hears 
637 while the right hears 194 in such a 
fashion that as the left hears “6,” the right 
hears “1,” ete, Broadbent (1954) discovered 
that if such pairs of digits are presented in 
rapid succession—that is, at the rate of 
two pairs a second—S, when required to 
identify the material heard in a free recall 
manner, will tend to report first all of the 
digits presented to one ear and then the 


digits presented to the other (either 637194 
or 194637 for the above example). He also 
found that when S is required to report the 
first pair of digits (e.g., 61) first, then the 
second pair, and then the third (in the case 
where three pairs of digits have been pre- 
sented), recall is less successful than when S 
is permitted to give the digits heard in one 
ear and then those heard in the other. The 
first finding has been confirmed by Bryden 
(1962); the second, with slight modifica- 
tion, by Moray (1960). It is thus evidently 
easier to report the digits ear by ear than 
to report them pair by pair so long as rate of 
presentation is fairly rapid. On the other 
hand, when the material is presented slowly, 
it is more common for Ss to give the material 
in the order of arrival (Broadbent, 1954; 
Bryden, 1962) . 

Broadbent (1958) has proposed a model in 
terms of sensory “channels” to account for 
these findings. He argues that the material 
arriving at one ear (channel) is attended to 
and perceived as it arrives, while the ma- 
terial arriving at the other ear is held in 
short-term storage. Once S has perceived 
all the material on the first channel, he can 
attend to the material from the second, pro- 
vided that the memory traces in the short- 
term storage system have not decayed. The 
S is unable to switeh attention from ear 
to ear (channel to channel) fast enough to 
assimilate all the incoming information 
when a rapid rate of presentation is used and 
so "listens" (attends) to one ear while a 
“filter mechanism" of some sort shunts the 
information coming into the other chan- 
nel into storage; thus, S reports all the 
numbers from the one ear first. If the rate 
of presentation is slowed down, S has enough 
time to shift attention channel to channel, 
and so ean report the material in the order 
of arrival. Bryden (1962, 1964) has con- 
sidered these various modes of recall in 
terms of strategy of recall. 

The exact nature of the “filter mechanism” 
is left unspecified by Broadbent, and a con- 
sideration of its nature and locus has re- 
sulted in some disagreements (cf. Emmerich, 
Goldenbaum, Hayden, Hoffman, & Treffts, 
1965; Moray, 1960). There is, however, 
fairly good agreement that the STM system 
must contain at least two parts: (a) a 
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limited capacity channel which attends to 
information as soon as it is received (termed 
perceptual, or P system) ; and, (b) a stor- 
age area which can store, for short periods of 
time, such superfluous information as can- 
not immediately be carried by the P system 
(termed S system). The dichotie listening 
studies of Broadbent (1954, 1956, 1957), of 
Broadbent and Gregory (1961), Moray 
(1960), and Bryden (1962, 1964) support 
these concepts as outlined. 

The dichotic technique thus seems to be 
an excellent indieator of memory capacity, 
both of the S and P systems. Broadbent's 
discussion of order of recall, however, would 
have it that the structure of STM is what 
determines whether a person recalls the 
digits by ear or in temporal order. That is to 
say, the switching mechanism is what de- 
termines S's order of recall, and as was noted 
earlier, considerable disagreement with 
Broadbent has arisen over this claim. In 
view of the evidence, a better approach, it 
seems, would be to accept STM as a two- 
part system, but to view, along with Yntema 
and Trask (1963), S's recall performance 
more in terms of a search process, and, along 
with Bryden (1962), the order of recall as a 
strategy. The fact that even at fast rates of 
presentation intrusions from the second ear 
do occur in recall of the first, and vice versa 
(what Bryden would term "attempted ear 
order") would indicate that the two chan- 
nels are not totally separable as suggested 
by Broadbent. However, the ear order of re- 
call may well be the best strategy of recall 
that S can adopt. 

The problem of the retarded. From the 
studies of Broadbent (1954, 1956, 1957) In- 
glis (1960), Inglis and Sanderson (1961), 
and Bryden (1962, 1964) the utility of the 
dichotic listening technique in studying 
STM would seem widespread. Tt is capable 
of pieking up differences where conventional 
techniques fail, and if differences are pres- 
ent, as in the case of senile versus intact sub- 
jects, can piek up these differences with a 
very small population sample. The results 
appear to be generalizable to channels of in- 
put other than ear alone (cf. Broadbent, 
1957; Caird & Inglis, 1961). Research using 
this technique has led to renewed interest in 
attention and also to the postulation of a 


theoretical structure of STM (Broadbent, 
1958) that has generated a considerable 
amount of research. Finally, as a technique, 
it is amenable not only to the discovery of 
differences in memory capacity, but also to 
the identification of strategies used in en- 
coding information (Bryden, 1962, 1964), a 
problem most other techniques used in the 
investigation of STM find difficult to handle, 

Whether or not differences in storage ca- 
pacity and/or strategy of information cod- 
ing will account for group differences, such 
as those between normals and retardates, 
remains to be seen. The hypothesis of this 
paper is, however, that these two concepts 
should go a long way towards the explana- 
tion of such differences as do exist. As was 
indicated at the outset of this chapter, the 
problems of mental retardation should make 
a good proving-ground for such an hypoth- 
esis. 

N. R. Ellis (1963) has pointed out that 
really very little is known about the STM 
of mentally retarded—either in terms of 
strategy or capacity. On a gross level dis- 
tinctions can be made between the organic 
and cultural-familial retardates for instance 
(ef. Robinson & Robinson, 1965). When, 
however, attempts have been made to test 
for such differences with such short-term 
techniques as delayed recall, relatively few 
have been found (cf. Osborn, 1960; Weath- 
erwax & Benoit, 1957). It has furthermore 
been observed that retardates as a whole do 
about as well on laboratory tasks under 
some circumstances as do normal Ss. For 
example, Shapiro and Johnson* have found 
that mentally retarded do quite well 
on laboratory learning tasks so long as 
learning trials are distributed and extrane- 
ous “noise” in the learning system is mini- 
mal. Laboratory tasks of this nature depend, 
of course, heavily on STM. In a task such 
as mentioned above, the amount of infor- 
mation to be handled by the Ss is lim- 
ited, it can be handled successively, and 
most of it is relevant. In a scholastic setting, 
however, the information available to the 
individual is almost infinite, and only some 


4G. M. Shapiro and R. C. Johnson. The effect of 
massed vs. distributed practice on the learning O! 
bright, average and dull children matched in mental 
age. Unpublished manuscript, Honolulu, 1964. 
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of it is relevant to the task of learning. Dif- 
ferences in learning in such a case could be 
thought of in terms of coding strategy— 
retardates use less optimal strategies than 
do normals, such as attempting to encode all 
the information available, or not discrimi- 
nating between relevant and irrelevant in- 
formation, and so forth. If this is the case, 
such differences should become evident with 
the dichotic listening task where two differ- 
ent sources of information are present, both 
of which are highly demanding of attention 
and where the information is presented too 
rapidly to be handled successively. In such 
a situation normals (at least those above 
chronological age [CA] 11, as demonstrated 
by Inglis & Caird, 1963) tend to report all 
the information received by one ear before 
reporting that of the other. How retardates 
respond in such a situation was not known 
and remained for the following experiments 
to discern. One might suspect, however, that 
if theirs is a problem of encoding strategy, 
as suggested above, retardates might well 
be found attempting to encode all the infor- 
mation of both ears simultaneously by shift- 
ing attention from one channel of input to 
the other. Because of the rapidity of presen- 
tation (one pair per half second), though, 
such a strategy would tend to result in a 
net loss of such information to recall. 

Finally, several studies (Hermelin & 
O'Connor, 1964; O'Connor & Hermelin, 
1965) have shown that, though the digit 
spans of retardates and normals differ little, 
a faster rate of decay of STM occurs in the 
retarded than in normal Ss. Where does 
this difference lie? Is it primarily due to 
decay in the S system as with the senile 
(cf. Inglis & Sanderson, 1961), or is there 
also a difference in capacity of the P sys- 
tem? The experiments which follow have 
been designed to measure both differences in 
strategy of attention and recall, and in 
short-term storage capacity between normal 
and mentally retarded Ss. 

ExPERIMENT I 

As a preliminary step in testing the hy- 
potheses outlined above, a study was carried 
out to ensure that the dichotic listening tech- 


nique would be applicable to the mentally 
retarded, 


METHOD 


Subjects 


Three groups of Ss matched in mental age were 
used—two mentally retarded and one normal con- 
trol. The mentally retarded Ss, obtained from 
Linekona School for Retarded Children in Hono- 
lulu, were grouped into those who were retarded 
due to organic causes, as determined by medical 
report (Group O), and those presumably retarded 
for cultural-familial reasons—at least with no 
known organic cause (Group F). Normal Ss (Group 
N) were obtained from the University Elementary 
School. The Ss in these groups ranged in mental 
age (MA) from 8 years 3 months to 11 years 6 
months with the mean IQ for each group, as meas- 
ured by the Wechsler Intelligence Scale for Chil- 
dren (WISC) as follows: 70.8 for Group O; 68.8 
for Group F; and 110.8 for Group N. No S showed 
any impairment of hearing. 


Procedure 


The apparatus used to administer the binaural 
stimuli consisted of a Sony two-channel tape re- 
corder (Model 464CS) played into a pair of Sharpe 
headphones (Model HA10). Different sets of digits, 
taken from Inglis and Caird (1963) were recorded 
on each channel as shown in Table 1. Each series 
was recorded so that two numbers, one from each 
channel, were simultaneously heard by S. The digit 
pairs within each series were recorded at the rate of 
one pair every one-half second. Care was taken to 


TABLE 1 


Drerrs UsEp FOR BINAURAL STIMULATION IN 
ExPERIMENTS I AND II 


Channel 1 Channel 2 
Practice series 
A 3 Blank 
B Blank 7 
Cc 3 T 
"Test series 5 8 
7 6 
4 1 
6 3 
39 72 
85 17 
38 59 
65 28 
592 174 
793 462 
479 836 
584 719 
5638 2941 
9754 8362 
6542 7918 
9356 4271 
81342 96571 
74682 31579 
57841 29356 
38671 15429 


6 Are H. 


control the numbers on each channel for timing 
and intensity. The headphones covering Ss' ears 
were equipped so that each ear received only the 
digits from one of the two channels. 

Each S, on first arriving, was seated at a table 
opposite the experimenter (E). The S was briefly 
introduced to the use of the headphones, and then, 
via headphones, was instructed following Inglis 
and Caird instructions: “Now listen carefully. 
You are going to hear a number. I want you to tell 
me what number you hear.” Practice Series A (the 
spoken Digit 3 on Channel 1) was then played. 
If S responded correctly, the procedure was re- 
peated with Series B. If S failed to respond or 
gave the wrong number, the volume was increased 


GROUP 
o x—— — —x 
F | e----- 
3.0 o= o 
N o u 
8 
6 
4 


MEAN TOTAL DIGITS RECALLED 
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until the correct response was made. Each S was 
then told: “Now you are going to hear two num- 
bers together, one from each ear. Tell me what 
numbers you hear." The two channels then played 
the spoken digits 7 and 3 simultaneously (Series C). 
If S responded with the correct digits (i.e., 73 or 37) 
then the test series were commenced. This pro- 
cedure provided a practice series allowing S time to 
become used to the experimental situation, and 
ensuring that group differences were not due to dif- 
ferences in sensory acuity. When S was fully 
acquainted with the procedure, the test series was 
begun in the order shown. Th longest series pre- 
sented to Ss of this study was the four-pair series 
shown in Table 1. The Ss were informed of each 


FIRST 
HALF SPAN 


SECOND 
HALF SPAN 


DIGITS PER CHANNEL 
Fra. 1. Mean number of digits recalled per trial for series varying in length. 
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change in series length by the instruction: “Now 
you are going to hear [N] numbers, [N/2] in each 
ear” (where N was 2, 4, 6, or 8). “Tell me what 
numbers you hear.” Between each of the items 
within a series Ss were asked, “Now what numbers 
do you hear?" The E recorded S's output on mim- 
eographed score-sheets for later scoring. 

Responses were scored following the procedure 
used by Broadbent (1954, 1956, 1957) and by Inglis 
and Sanderson (1961). The first digit repeated by S 
determined in each case which channel was taken to 
be the half-span recalled first. The score obtained 
was the average number of correct responses for 
each half-set of digits, taking each digit’s position 
in the series into account. 


RESULTS AND DISCUSSION 


Figure 1 illustrates that, using the dichotic 
listening technique, group differences in 
STM are distinguishable. Most striking to 
cursory examination is that for all groups 
the digit half-span recalled first is much su- 
perior than that recalled second. This result 
agrees with those obtained by Broadbent 
(1954) and by Inglis and Sanderson (1961), 
and concurs with the P and S models ad- 
vanced by Broadbent (1958). The dichotic 
technique thus appears to be suitable for 
more detailed study of group differences be- 
tween retardates and normals. 

The results were subjected to an arcsin 
transformation (cf. Snedecor, 1956) in order 
to obtain homogeneity of variance. A Lind- 
quist Type I (Lindquist, 1953) analysis of 
variance for between-group effects ap- 
proached significance (F = 2.77, .05 < 
p < .1), and thus was suggestive that with 
better control these groups would in fact 
differ significantly. One obvious artifact af- 
fecting these results was that no attempt 
had been made to match the groups on the 
ordinary digit span. Furthermore, Inglis and 
Caird (1963) have shown a significant effect 
of CA on immediate memory. Since the re- 
tardates were older (mean CA of Group O = 
13 years 6 months) and hence had more ex- 
perience than the normals (mean CA = 7 
years 11 months), this may have in effect 
narrowed such group differences as in fact 
may be present. In view of such confounding 
effects of CA on performance Denny (1964) 
has suggested that two normal control 
groups be used—an MA and a CA control. 
Further research should, then, not only 


match the groups on digit span, but also use 
the dual control suggested by Denny. 


Experiment II 


Experiment I presented evidence in sup- 
port of the notion that both strategy and 
storage capacity play an important part in 
differences between normal and mentally 
retarded Ss. The purpose of this experiment 
was to extend and refine that evidence by: 
studying the problem in more detail, and 
improving on its experimental design by (a) 
using a dual control as suggested by Denny 
(1964) and (b) matching the experimental 
and control groups on digit span. 


METHOD 


Subjects 


Four groups—two mentally retarded and two 
normal—of 15 Ss each were used. On the basis of 
medical evidence available in files on each S, 15 
clearly organic retardates (9 males and 6 females) 
were selected, ranging in WISC IQ from 53 to 79. 
Fifteen familial retardates were then selected, 
matched in MA and digit span, 1 for each Group 
O S. The Ss were matched in MA, but E kept 
the IQ and hence the CA of the matchings quite 
close as well. Plus or minus 3 months MA was con- 
sidered an adequate match (see Table 2 for a sum- 
mary of matching data). Items and instructions 
from the WISC digit span forward were utilized in 
obtaining a measure of each S's digit span. 

The normal control groups were matched as 
follows: Each of 15 Ss of the first group (NMA) 
was matched in MA and digit span with one of 
the O-group Ss following the same procedure used 
in the organic-familial matching above. It should 
be noted that the only intelligence ratings avail- 
able for these Ss were from the California Test for 
Mental Maturity (CTMM) regularly administered 
to local elementary school children. It was felt, 
however, that though not perfect, this rating was 
adequate as an estimation of normaley for the MA 
control group. Each S of the second control group 
(NCA) was similarly matched with one of the O- 
group Ss, but in terms of CA rather than MA, keep- 
ing their IQ range within the normal range of plus 
or minus 1 standard deviation from the test mean. 


5 Retarded Ss were obtained from special classes 
in Kauluwela, Liluokalani, and Nuuanu Elementary 
Schools, and from Linekona School. The younger 
normal Ss were obtained from Liluokalani Ele- 
mentary School, and the older Ss from Hawaii 
Baptist Academy. The author expresses his ap- 
preciation to the principals of these Honolulu 
schools, and to Jerry Cochrane, Director of Special 
Education, Department of Education, Honolulu, 
Hawaii, whose interest paved the way. 
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TABLE 2 
Summary or Data on WnuicH Grours WERE MATCHED 


CA MA 1Q* Digit span 
Group 
Mean SD Mean SD Mean SD Mean SD 
(0) 13.50 2.09 9.25 1.35 69.13 6.24 4.53 .80 
F 13.08 2.10 9.25 1.38 71.13 4.14 4.67 .60 
NMA 8.83 1.37 9.25 1.40 105.27 6.76 4.73 44 
NCA 13.50 2.01 14.39 2.56 106.86 5.33 5.60 .98 


a The IQ scores for Groups O and F are derived from the WISC, but the estimates of IQ for Groups 
NMA and NCA are based on group tests given in the schools. 


Although attempted, it was found that as close a 
match on digit span as found in the previous three 
groups was not to be obtained here, so that in 
fact the digit span of NCA is superior to all other 
groups (e.g., comparing NCA with NMA, t = 3.36, 
p < 05). The matching procedure is discussed 
further below. 


Procedure 


The apparatus, procedure, and instructions were 
the same as those used in Experiment I except as 
follows: (a) the items from the WISC digit span 
forward were recorded and administered via head- 
phones to all Ss, for matching purposes, before 
proceeding with the instructions of the experiment 
proper. Plus or minus one digit was allowed as ac- 
ceptable for matching the digit span of individual 
Ss in Groups F and NMA with those of Ss in 
group O; (b) whereas the one-pair series con- 
tributed little to the study of group differences, this 
material was dropped from presentation, and the 
five-pair series shown in Table 1 was added; and, 
(c) the test series were presented in a partially 
counterbalanced order—half the Ss in each group 
receiving the two-pair material first (the order 
shown in Table 1), the other half first receiving 
the five-pair material (reverse order). 

Scoring. Two scoring procedures were utilized: 
(a) ear order—the procedure described and used 
in Experiment I. This procedure was used as the 
best estimate of P- and S-system capacities, follow- 
ing Broadbent (1954), Broadbent and Gregory 
(1961), and the Inglis (1961) and Inglis and Sander- 
son (1963) studies, and also as a measure of the 
degree to which the ear order strategy of recall was 
in use; (b) all other systematic orders. This second 
procedure allowed for the use of other strategies 
of recall, such as those delineated by Bryden 
(1962). Consider for example that S has heard 653 
on his left ear, and 924 simultaneously on his right. 
A normal S might respond 924653. A response of 
this nature would be scored as 3 correct for the first 
half-span recalled and 3 for the second half-span 
for both scoring procedures as the response follows 
the “ear order." Suppose, however, that a response 
is 624953 or 654923, then the “ear order" method of 
scoring would provide values of 1 and 1 for each 
half-span (2 and 2 for the latter response) as the 
first number given is taken as the indieator for the 


half-span first recalled. It can readily be seen, 
however, that this S actually reported all six digits 
correctly, but rather than following the ear order he 
switched from ear to ear in an “attempted ear 
order.” The second scoring procedure would, then, 
record this latter response as 3 and 3. One other 
systematic recall pattern would be illustrated by a 
response of 695234, or of 692534. Both these “tem- 
poral” responses would yield a score of 1 and 1 for 
Scoring I, but 3 and 3 for Scoring II. As with Bry- 
den, most of the responses in the following studies 
fell into the above three responding orders. The 
relatively small proportion of systematic responses 
remaining tended to be a minor variant of one of 
the above, involving either the repetition of a 
number, or intrusion of an incorrect number.’ Scor- 
ing II, then, involved the application of that sys- 
tematic order most closely resembling the given 
response to that response. Thus, “temporal” and 
“attempted ear order" responses were as accept- 
able as “ear order" responses. 


RESULTS AND Discussion 


Figure 2 presents the mean number of 
items recalled by each group for each series 
length as determined by the two scoring 
procedures. Figure 3 more clearly reveals 
the overall group differences by combining 
the data from both half-spans recalled. 
These same data are presented in Figure 4 as 
a proportion of the total number of items 
presented that were correctly recalled. It is 
readily apparent from the three figures that 
Group NCA is much superior to any of the 
other three groups, with NMA falling above, 
but clustering with Groups F and O, in that 
order, From these figures the differences be- 
tween Scoring I and Scoring II are also 
readily apparent. As Scoring II contains all 


* A question might be raised as to the effects of 
chance on the scores assigned to responses. How- 
ever, as Bryden (1962 p. 293) has pointed out, the 
likelihood of guessing the correct numbers in one of 
the systematic orders is extremely low. 
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MEAN TOTAL DIGITS CORRECT 


SCORING ii 


FIRST HALF-SPAN 


SECOND HALF-SPAN 


2 3 4 5 


SERIES LENGTH (NO. OF DIGITS PER CHANNEL) 


Fic. 2. Amount recalled from series varying in length, summing across the trials of each series length. 


the information of Scoring I, plus any addi- 
tional information retained by strategies of 
recall other than “ear order,” the results of 
the second scoring are always above those of 
the first (compare Scoring I and Scoring II 
on Figures 2, 3, and 4). 


After subjecting the original proportions 
to the standard arcsin transformation (cf. 
Snedecor, 1956), three Lindquist (1958) 
Type VI analyses of variance were com- 
puted for each scoring of the data. These 
analyses are summarized in Tables 3 and 4. 
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Fra. 3. Total amount recalled in both half-spans from series varying in length, 
summing across the trials of each series length. 


A contrast of Tables 3 and 4 reveals little 
difference in discriminability by the two 
scoring procedures used. In both cases the 
three main effects of Group, Series Length, 
and Half-span were highly significant on 
all analyses, with Length X Half-span the 
only interaction consistently so. 


Consider these results in detail. The over- 
all group effect shows the normal Ss (both 
NCA and NMA) to be superior to retardates. 
Such differences must be considered in the 
light of the Group x Half-span, Group X 
Series Length, and the Group x Half-span 
X Length interactions. The Group x Half- 
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Fra. 4. Proportion of items recalled as a function of series length. 


Span interaction is of particular interest here 
because group differences are expected to 
occur on both halves of information pre- 
sented. The degree to which the main Group 
effect is influenced by Half-span and Series 


Length direetly tests the questions about 
the P and S systems as advanced in the in- 
troduction. It can be noted that the Group 
x Half-span interaction is not significant. 
This suggests that the main Group effect is 
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to be interpreted in terms of differences in 
both the P and S systems. 

The hypothesis with regard to the Group 
X Length interaction was that group dif- 
ferences should be minimal on the shorter 
series and increasingly in evidence as series 
length increased. This interaction, as well 
as the Group X Length x Half-span triple 
interaction was found to be significant on 
the O x F x NCA analysis. The changes 
representing these interactions can be il- 
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lustrated as in Figure 5. Using the critical 
difference technique of determining signifi- 
cant group differences (Lindquist, 1953, pp. 
271—272), the above prediction was found to 
be generally true in the first half-span at- 
tended to; that is, no group differences were 
found on the short series, but the groups 
did differ significantly on the longer series 
(see Table 5). In Broadbent's (1958) terms, 
it would appear the P system (measured by 
the first half-span) of all groups can ade- 
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Fia. 5. Group differences in each half-span as a function of series varying in length. 
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TABLE 5 
SIGNIFICANT MEAN GROUP DIFFERENCES FOR BorH Harr-SPANS, First Scorrne or EXPERIMENT II 


Series length in digits per ear 


Groups compared 


Critcial difference 
for p < .05* 


2 4 5 
First half-span 
O versus F. n.s. ns. ns. n.8. 10.76 
O versus NCA n.s. 16.91 23.63 10.93 10.76 
F versus NCA n.s 11.39 18.32 n.s. 10.76 
Second half-span 
O versus F. n.8. ns. n.8. n.s 14.07 
O versus NCA 39.83 25.32 ns. n.s 14.07 
F versus NCA 29.21 23.32 ns. n.8 14.07 
O versus NMA» 21.65 


^ The critical differences on this table were calculated from the Mean Square within cells obtained 
from Lindquist Type I analyses carried out separately for each half-span of data (cf. Lindquist, 1953, 
pp. 271-272). These results are summarized in the Appendix. 

» All other comparisons of O X F, O X NMA, or F X NMA were not significant. 


quately handle a relatively small amount of 
information. However, whereas the NCA Ss 
are able to increase the load of the P system 
(up to a point) with inereased input of in- 
formation, the P capacity of retardates ap- 
pears to remain about the same (see Figure 
2) across input conditions. 

In the second half-span (second ear at- 
tended to) the above prediction did not hold 
true. In fact, just the reverse was the case. 
Group NCA had signifieantly better recall 
than Group O or F on the shorter series, but 
such differences diminished to virtually none 
at the longest (Table 5). It seems, then, that 
storage capacity of retardates is either uni- 
formly poor in encoding or else subject to 
rapid decay, the latter being Broadbent's 
hypothesis. If we agree with Broadbent 
(1958) and with Broadbent and Gregory 
(1961) that the first scoring procedure used 
here is the only valid measure of the S-sys- 
tem, and for the sake of parsimony that the 
same process occurs in both normals and re- 
tardates, then decay would seem to be the 
correct answer. In Experiment I (Fig. 1, 
above) it was noticed that the S system of 
retardates operates about as well as that of 
normals at the very short series (one digit 
per ear). With longer series, even as short as 
two digits per ear (as in this study), recall 
from the S system falls off and remains low. 
Similarly, though the short-term storage 
capacity of normals is relatively good at 
the shorter series (noted above) as the nor- 
mal § has to attend to an increasingly longer 


series with his P system, the information in 
the S system is increasingly subject to de- 
cay. Such stimulus decay seems, though, to 
occur somewhat more slowly in the CA con- 
trol than in the mentally retarded so that a 
lack of difference is found between NCA and 
the retarded groups only at the five-digit 
series. 

Earlier it was noted that the main group 
effect of the O x F x NMA comparison was 
significant. This significance was found in 
both Table 3 and Table 4 above. The sug- 
gestion thus was that Group NMA was su- 
perior to one or both of the mentally re- 
tarded groups. A more detailed analysis of 
Scoring I with the same three groups, but 
considering the data from each half-span 
separately, found that the P system (first 
half-span) of Group NMA as a whole, 
though approaching it at points, by para- 
metric analysis never differs significantly 
from that of either Group F or Group O. The 
S system of Group NMA, however, is su- 
perior, but only to Group O and this on the 
shortest series alone. A similar analysis of 
Scoring II presented the same results. Taken 
in the light of the O X F x NCA compari- 
son above these results suggest that the ca- 
pacity of the P systems of Groups O, F, 
and NMA are very similar. Furthermore, 
these normals of comparable mental age 
seem to have the same problem of short- 
term storage decay that the mentally re- 
tarded do (this might be taken as suppor- 
tive evidence for the utility of the concept of 
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"mental age"). The fact that Group NMA 
is significantly superior to Group O on the 
shortest of the series may well indicate that 
the information decay process in the S sys- 
tem of these normals is somewhat slower 
than in the retardates, but not quite as slow 
as that of Group NCA. 

Consider next the two main factors of 
half-span recalled, and series length. All 
previous evidence (i.e, Broadbent, 1954; 
Dodwell, 1964; Inglis & Sanderson, 1961) 
suggested that gross and significant differ- 
ences would be found between the two half- 
spans; that is to say, the first channel at- 
tended to should be much more accurately 
recalled than the second. The results of this 
study corroborate such previous research 
and thus are supportive of Broadbent's the- 
ory in this respect. The significant series 
length effect is similarly not of particular in- 
terest except again insofar as to corroborate 
previous evidences; namely, that an in- 
crease in length of series nets a decrease in 
percentage of items correctly recalled. 


Exrerment III 


Earlier it was noted that normal Ss will 
generally use an “ear order" of recall in the 
dichotic situation with an occasional lapse 
to some other recall order. This statement 
holds true primarily for rates of presenta- 
tion as rapid as one pair per half second. 
Broadbent (1954) and Bryden (1962) have, 
however, also shown that at slower rates of 
presentation, such as one pair per 2 seconds, 
one of the other recall strategies, primarily 
recall by temporal order, will more fre- 
quently be used; that is, the numbers will 
tend to be recalled in their order of arrival. 
As was noted in the introduction, although 
Broadbent (1958) accounted for this phe- 
nomenon in terms of a “switching mecha- 
nism,” it was felt that this change in recall 
might better be thought of in terms of recall 
strategy. At slow rates of presentation S can 
most readily recall the information in the 
temporal order; whereas, at faster rates of 
presentation it becomes optimal for the nor- 
mal S to focus attention on all the informa- 
tion presented in one channel first before 
switching, thus avoiding the wastage of time 
spent in switching and a concomitant loss of 
information. 


Evidence from both Experiments I and II 
showed that even at the fastest rate of 
presentation mentioned above, retardates 
frequently will use some order of recall other 
than the ear order. That is, they tend to al- 
ternate their attention as the normals (par- 
ticularly Group NCA) do. This experiment 
was designed, then, to investigate whether 
the retardates can be induced to use the 
more efficient ear order of recall by using 
arate of presentation faster than that previ- 
ously used. 


METHOD 


Subjects 


Dodwell (1964) in a carefully controlled series 
of studies, found practice effects on this type of 


task to be virtually nonexistent; thus the same Ss - 


were used in this experiment as in Experiment II; 
namely, two groups of retarded (organic and fa- 
milial) and two normal control groups (MA and 
CA controls). There were 15 Ss per group. 


Procedure 


The equipment used in this experiment was the 
same as that used previously. 

Twenty-four three-pair series of numbers were 
recorded, using numbers from 1 to 10. Each series 
consisted of six different numbers, Six series were 
recorded at each of the following rates: one pair 
per 2 seconds, one pair per second, one pair per 
half second, and one pair per quarter second. The 
four conditions were presented in a partially coun- 
terbalanced order, half of the Ss in each group be- 
ginning with the slow rate of presentation, and 
half with the fast rate. 

Scoring. The same two scoring procedures as 
used in Experiment II were used in this study. 
However, as the second scoring method is an esti- 
mate of over-all correct output, an additional 
measure was obtained in this study—a difference 
measure between the two scoring procedures. It 


was noted in Experiment II, above, that when the ' 


ear order of report was used predominantly, the 
difference between the two scoring procedures was 
minimal. When, however, some other strategy of re- 
call is utilized by S, then the difference in scores 
obtained by the two procedures increases. The rela- 


tive usage of the ear order of recall as compared. 


with other strategies, can thus readily be deter- 
mined. For normal Ss this Difference score should 
be low at the fast rates of presentation and increase 
at the slower speeds. This would follow from the 
evidence presented by Broadbent (1954) and Bry- 
den (1962) which shows that normals tend to use 
the “ear order" of recall at fast rates of presentation, 
but switch to a temporal order of recall at slower 
rates. 
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RESULTS AND DISCUSSION difference scores graphically. The prediction 
y As was indicated above, the clue as to that these scores should be low at the fast 
à whether or not retardates can be induced to — rates of presentation and large on the slower 
use the more effective ear order of recalllies speeds was tested by a Lindquist Type I 
in the difference scores. Figure 6 presents the analysis of variance, summarized in Table 6. 


GROUP: 
15 o X- — ——X 
F o------! o 
NMA np in v 
NCA y v 


MEAN DIFFERENCE SCORE 


| RATE OF PRESENTATION IN SECONDS 


Fig. 6. Group differences in the use of recall strategies as a function of rate of series presentation, 


calculated by taking the difference between the two scoring procedures used. 
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TABLE 6 
SUMMARY OF ANALYSES OF VARIANCE OF DIFFERENCE SCORES, EXPERIMENT III 
Groups analyzed 
Source of variation OXFXNCA OXFXNMA NMA X NCA 
df MS F MS F df MS F 
Between Ss 
Groups (G) 2 75.09 1.16 18.20 n.s. 1 53.34 1.19 
Error (b) 42 64.90 55.90 28 44.81 
Within Ss 
Presentation 
Speed (Sp) 3 282.12 26.59*** 116.84 11.29*** 3 354.89 31.43*** 
G X Sp 6 60.45 5.70*** 9.66 n.g. 3 59.60 5.28*** 
(Sp X S)w 126 10.61 10.35 84 11.29 
‘digit op l i 


The significant main effect of presentation 
speed indicates that the groups do indeed 
change their recall strategy with shift in 
presentation speed, and a glance at the 
graph indicates that the shift is in the pre- 
dicted direction. 

The significant Group X Presentation 
Speed interaction at two of the three levels 
tested, however, suggested that some groups 
are more affected by the change in speed 
than others. Most noteworthy is Group 
NCA, as can be seen in Figure 6. This group 
most closely follows the prediction as out- 
lined, showing a very marked shift. In other 
words, at high speed of presentation this 
group uses the ear order almost exclusively 
(as measured by Scoring I), but as the speed 
of presentation is slowed, Group NCA comes 
to depend largely on other strategies of re- 
call. Group NMA, though not as markedly, 
is also strongly affected by presentation 
speed. A Treatments X Subjects analysis of 
variance on the NMA data alone revealed 
the effect of speed as highly significant 
(F = 6.78; df = 3, p < .001). 

Of the two groups of retardates only 
Group F changes strategy to a significant 
degree. A Treatments x Subjects analysis 
of variance obtained F — 3.20; df — 3, 42; 
p « .05. The treatment differences, however, 
were significant only at the extreme graphic 
points (¢ = 2.18, df = 14, p < .05) and thus 
have to be accepted with caution. Group O, 
on the other hand, was only mildly affected 
by the change in speed (£ test between lowest 
and highest graphic points nets ¢ = 2.09, 
df —14,p < .1). 


From these results, then, we can conclude 
that normal Ss tend to be quite flexible in 
their use of recall strategy. The attempt at 
inducing the retarded to use the more op- 
timal ear order of recall can be regarded as a 
modest success only. Far more impressive is 
the rigidity in the behavior manifested by 
these two groups. Retardates generally, and 
partieularly those of Group O, tend to show 
very little inclination toward changing their 
pattern of recall, even when it is strategic to 
do so. 


Experiments IVa AND IVb 


The Broadbent (1958) attention hypoth- 
esis suggests that when dichotic information 
is received at a fast pace the best strategy 
that S can adopt—indeed, is almost forced 
to adopt by the hypothetieal switching 
mechanism—is to “fix his attention on one 
ear" and perceive the digits presented to 
that ear at the time they are presented. The 
S holds the stimuli presented to the other 
ear in the S system and goes back to per- 
ceive them later. The digits are, so to speak, 
lined up in the P system in the order in which 
they are perceived, and so are most easily re- 
ealled in that order. If they were to be re- 
called in any other order—for example pair 
by pair—then they must be rearranged, 
which is difficult just as it is difficult to re- 
arrange an ordinary list of digits and say 
them backwards. 

Yntema and Trask (1963), however, have 
suggested that recall performance entails 
more of a search process. They assume, in 
opposition to Broadbent, that both members 
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of a pair are perceived and stored in memory 
at the time of presentation. The processor 
(S) then adopts a search plan (a term taken 
from Miller, Galanter, & Pribram, 1960), 
with certain search plans being more readily 
executed than others, It is, for instance, 
easier for the search to go forward in terms 
of presentation order than to go sideways 
(as in the recall of individual pairs) or back- 
wards. Following this line of reasoning 
Yntema and Trask suggest that the items 
are most easily retrieved ear by ear be- 
cause they have no other characteristic that 
so neatly divides them into two groups 
within which the processor may proceed in 
temporal order. If, however, another promi- 
nent set of characteristics or tags were 
made available to S, the search process 
should just as readily follow this order as 
the ear order, Consider the following ex- 


ample: 
Left ear Right ear 
0 Good 
Room 2 
5 Coil 


Each pair contains both a digit and a word; 
and the pairs are presented to S at one 
per half-second. According to Broadbent's 
(1958) attention hypothesis, S should most 
readily recall the information from one ear 
and then the other. Recalling words and 
then digits should be difficult, from Broad- 
bent’s point of view. According to the search 
hypothesis, however, the S should just as 
readily, or perhaps more readily, recall the 
items by type of information as by ear or- 
der; perhaps more readily because each 
item is unambiguously tagged as a word or 
a digit, but there may at times be a little 
uncertainty about the side on which it is 
heard. Evidence found by Yntema and 
Trask (1963), as well as by Gray and 
Wedderburn (1960) and by Bryden (1962, 
1964), tends to support this latter line of 
reasoning. 

This experiment was designed to test 
whether the mentally retarded could adopt 
a given strategy of recall (search process) 
as readily as normals, Previous experiments 
(cf. Gray & Wedderburn, 1960; Yntema & 
Trask, 1963) used familiar words, or word 
phrases. It was felt, however, that Ss used 
in this experiment would be more equally 


familiar with letters of the alphabet than 
words. With this in mind, 10 letters, A, E, 
I, O, U, Y, L, M, R, X, were chosen. It can 
be noted that except for I and Y none of the 
letters rhyme with another, and that they 
all are spoken as a single syllable as the 
digits are. A short study was carried out to 
ensure the equivalence of these materials. 
This is described as Experiment IVa, below. 


Experiment IVa 


Ten mentally retarded Ss, of the familial- 
cultural variety, naive with regard to the 
dichotic listening task, were obtained from 
Linekona School. Retarded rather than nor- 
mal Ss were used as it was felt that of the 
two the retarded Ss should find the letters 
and numbers least equivalent. 

Sixteen three-pair series, of which 8 con- 
tain only letters and 8 only digits were 
randomly arranged and recorded on the 
stereo tape in the same manner as in the 
previous experiments. The pairs within a 
series were presented at the rate of one 
pair per half second. The instructions used 
for the practice series were the same as 
those used in Experiment I. 

Results. As in previous experiments, both 
scoring techniques were used here. For Scor- 
ing I the Mean Total number of items re- 
called was: numbers = 24.8, letters = 20.7; 
the t of the difference = .47; df = 9, and 
thus not significant. For Scoring II: num- 
bers = 32.6, letters = 29.3; t = .19, df =9, 
and similarly not significant. It thus seems 
that though the recall of numbers is slightly 
better than that of letters, this difference is 
minimal and at a chance level. 


Experiment IVb 

The four groups of Ss (O, organic re- 
tarded; F, cultural-familial retarded; 
NMA, normal MA control; and NCA, nor- 
mal CA control) used in previous experi- 
ments were used in this experiment as well, 
with the exception of two Ss from Group O 
who, with their matches in the other groups, 
who were dropped because of inability to 
maintain attention. Thus, there were 13 Ss 
in each subject group. 

Equipment was the same as that used 
previously. The dichotic series each con- 
sisted of three pairs, a pair being a letter of 


TABLE 7 


MATERIALS PRESENTED TO SUBJECTS IN 
ExpertmMent IVb 


Channel 1 Channel 2 
Practice series 4um al7 
oei 806 
85e ui6 
3m9 oTr 
4y0 x3u 
Test series 034 Loi 
y8r 7i3 
96a Ly4 
42u 208 
eyL 560 
9L8 e6i 
o8a 3m9 
095 4iu 
ux0 72y 
lar L69 
786 uyL 
Ory e94 
Practice series 9a5 x24 
eL3 7ly 
Oma 162 
"Test series al7 4um 
xLo 816 
8m9 L2a 
e0a. 5y9 
xm5 83i 
4e3 i0x 
umL 968 
840 ru2 
32i ao7 
Lmr 502 
5xL i74 
y27 9mx 
Practice series 7xl a9r 
rm4 20e 
9yo e65 
Test series u8y 4r5 
6ex 137 
6y4 r8a 
oax 832 
06L arg 
3y5 e2i 
423 mrL 
oi4 96r 
6au L05 
m53 7xr 
xeL 396 
080 ry4 


the alphabet presented to one ear and a 
digit presented simultaneously to the other 
(see Table 7). The digits were three dif- 
ferent digits (from 1 to 10), and the letters 
any three of those used in Experiment IVa. 
The pairs of a series were recorded at half- 
second intervals and are presented in Table 
T 
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At the beginning of the session E repeated 
the 10 letters to S, indicating that they were 
the vowels plus four consonants. The Ss 
then repeated the letters back to E. The S 
was then informed that this experiment, like 
the previous one, would always have six 
items, but always contain three numbers 
and three letters. The letters heard would 
always be three of those S had just learned. 
The S was also instructed to try to say ex- 
actly six items after every series, guessing 
when he could not remember. 

Three conditions were used. In the Pairs 
condition Ss were instructed to report the 
first pair of items, then the second pair, 
and then the third. In this condition E al- 
ways illustrated what was wanted by pre- 
senting S with an example, and then indi- 
cating which items belonged together. This 
continued until S understood what was re- 
quired of him. In the Sides condition half of 
the Ss in each group were instructed to give 
the items on the left in temporal order and 
then the items on the right in temporal or- 
der; with the other half left and right were 
reversed. In the T'ypes condition half were 
instructed to give the digits in temporal 
order and then the letters in temporal order; 
with the other half digits and letters were 
reversed. 

Each S made 12 trials under each condi- 
tion. The 12 trials were made in a block and 
were preceded by 3 or (for the first block) 
5 practice trials made under the same con- 
dition. Order of conditions was balanced 
across Ss within a group. The 12 lists in a 
block included three of each of the four pos- 
sible kinds—that is, no crossings (the digits 
all on one side and letters on the other), a 
crossing after the first pair, a crossing after 
the second pair, and two crossings. 


RESULTS AND DISCUSSION 


An item was counted as correctly recalled 
only if it was reported in the correct posi- 
tion. Figures 7 and 8 show the results. Lind- 
quist Type I analyses of variance were com- 
puted and found the main effect of recall 
strategy to be highly significant (see Table 
8). A further Treatments x Subjects anal- 
ysis of variance was also calculated for the 
data within each subject group to test for 
the relative effects of the three strategies on 
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worthy is that recall is much less accurate 
when S is instructed to report by simultane- 


those heard on the other. 
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Fra. 8. The same data as in Figure 7, but plotted to more clearly show group 
differences for each strategy of recall. 


ure 7, recall by Types is highly superior in tema's and Trask's conception of the search 
both the retardate groups, while somewhat hypothesis. 

better but showing no appreciable difference It is to be noted, however, that Yntema 
in the normals. This finding supports Yn- and Trask found Types of material to be 
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TABLE 8 
Linpquist Type I ANALYSES OF VARIANCE, EXPERIMENT IVb 
Groups analyzed 
Source of variation OXFXNMA OXFXNCA NMA X NCA 
d MS F MS F d MS F 
Between Ss 
Groups (G) 2. 441 L8 3428 —— 8.0l** 1 ae 
Error (b) 42 317 498 2C HA uA 
Within Ss 
Strategy 2 2117 40.71*** 2449 42.96*** 2 2017 51.72** 
G X Strategy 4 36 n.8. 113 1.98 2 44 1.10 
(Strategy X S)w 84 52 57 56 39 
*p < 05. 
**9 < 01. 
*** p < 001. 
superior in recall to the Sides condition, TABLE 9 
while no such differenee was found in the SrRATEGY DIFFERENCES WITHIN GROUPS 
normal Ss tested herein. This difference 
may well lie in the type of material used for EHE compares, 
recall. Yntema and Trask utilized words Group Balsa Hil pi paia MESS roe 
and digits, while this study used letters and Mw Gee epe tors Sits 
digits. As words, of course, are generally 
high in meaning as compared with letters ? Bisa ee 8.54 Ge 
or digits, it may well be that this difference NMA 13.07 15.58 ns. 4,12 
provided the additional eues to make recall NCA 18.00 18.23 ns. 2.42 
by Types better than that of Sides. Ex- 
periment IVa of this paper, however, found TABLE 10 


that, if anything, recalling letters is slightly 
more difficult than digits. The failure to find 
the Types strategy easier than the Sides 
strategy by the normal Ss in this experiment 
then suggests that the cues for materials 
used here are no more distinctive than those 
indicating which side of the head the items 
are heard on. 

Although normal Ss did not distinguish 
between the Types and Sides conditions the 
retardates did, finding recall by Types to be 
easier than recall by Sides. This result 
might indicate that the Sides condition is 
somewhat ambiguous—the S is not always 
able to distinguish whieh stimulus item 
comes from which side. From the evidence 
obtained here, it would seem that the men- 
tally retarded are not able to cope with such 
ambiguity as are the normals. 

Of primary interest to this study was 
the comparison between mentally retarded 
and normals with regard to their relative 
ability to handle the various recall strat- 
egies. In the overall analyses of variance 
(Table 8) it was noted that significant 


SIGNIFICANT MEAN GROUP DIFFERENCES FOR 
EacH STRATEGY 


Groups Strategy Critical 
compared Pairs Sides Types ERE 
O versus ns. 10.07 ns. 8.51 
NMA 
O versus 14.00 24.46 17.92 9.60 
NCA 
F versus 11.92 22.00 13.07 9.60 
NCA 
NMA ver- n.s. 14.39 11.46 10.04 
sus NCA 


group differences occurred only when Group 
NCA was involved. This effect can be dem- 
onstrated by arranging the results as in Fig- 
ure 8. Table 10, furthermore, demonstrates 
that Group NCA was superior to all groups 
on all strategies. In other words, the NCA 
Ss were able to utilize even the worst of 
these strategies (the Pairs condition) rea- 
sonably well. Indeed, as can be seen in Fig- 
ure 8, their poorest performance on the 
Pairs condition was about as good as the 
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best performance of the mentally retarded 
in the Types condition. This again is indica- 
tive of the flexibility with which the NCA 
Ss can adopt a given strategy as well as uti- 
lize their large P and S capacities. 

Both the retarded groups, however, do 
reasonably well as compared with Group 
NMA on the Types condition. A result of 
this nature is probably best interpreted as 
indicative that both STM capacity. and 
Strategy of recall, and the ability to use such 
strategies is closely linked to an individual's 
MA (rather than to CA or to IQ), thus also 
supporting the concept of MA. On the other 
hand, it should be noted that even though 
no significant differences exist (by para- 
metrie analysis) between Group NMA and 
the two retarded groups, except between O 
and NMA on the Sides condition, Groups O 
and F do fall below the NMA performance 
on all levels. This might suggest that even 
when matched in terms of MA the retarded 
Ss are not as readily able to utilize these 
various strategies. 


Discussion AND CONCLUSIONS 


The conjecture of a number of psy- 
chologists has been that perhaps the key 
to learning lies in the understanding of 
STM, since it seems reasonable to believe 
that if information cannot pass from STM 
into permanent storage, learning has not 
Oceurred. The experiments above have 
dealt with the problem of STM, and have 
been aimed at investigating it at its point 
of breakdown. 

Capacity. Consider the results of the first 
two experiments in terms of capacity. These 
experiments presented both the retarded 
groups (O and F) as well as the normal 
control groups (NMA and NCA) with di- 
chotic series varying in length from two 
digits per channel to five per channel (one 
to four in Experiment I). It has already 
been noted that in the over-all comparison 
of retardates with normals (both NMA and 
NCA), the respective analyses of variance 
revealed an overall main group effect. This 
overall effect can be broken down for con- 
sideration in terms of the P and S systems. 

First of all, the P system of Group NCA, 
as measured by the first half-set of digits 
recalled in the Scoring I, was decidedly 
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superior to that of both retarded groups, as 
well as to Group NMA. As was predicted, 
this superiority increased with length of 
the series to be attended to and recalled, At 
the shortest series length no difference was 
to be found, but with succeeding series the 
difference between these groups tended to 
increase. One could conclude from this that 
whereas relatively small amount of infor- 
mation can be utilized by the retardates, 
their P system cannot handle the larger 
amounts of information. The effective ca- 
pacity of this system is, then, fairly small 
for them as compared with normal Ss of 
their same chronological age. 

When comparing the P capacity of these 
retardates with the mental-age control, 
though, very little difference is found be- 
tween the groups. The capacity in this 
system seems about the same for organics, 
cultural-familial retardates, and normal 
controls of the same mental age. An exami- 
nation of the data as plotted in Figure 2 
(above) showed that the mean recall for 
this first half-span of Group O was con- 
sistently below that of NMA, but that 
Groups F and NMA are almost indistin- 
guishable. Although the differences between 
Groups O and NMA were not statistically 
significant, the results do suggest that the 
funetional P system of the O group tends 
to fall below that of NMA. The difference 
that is in evidence might thus well be due 
to their relatively greater difficulty in main- 
taining attention for more than a brief 
period of time, though this is but post- 
experimental conjecture. Suffice it to say 
that the groups did not differ in this respect. 
Such lack of difference in both the O and 
especially the F Ss suggests that the utility 
of “mental age” as a concept is meaningful 
to some degree at least. 

What, then, of the S system? It was in 
this auxiliary storage area that Inglis and 
Sanderson (1961) found differences between 
elderly patients with and without memory 
disorder when such obvious differences 
could not be distinguished by other STM 
techniques. It is likewise the system in 
which Inglis and Caird (1963) later found 
differences in age groups ranging from 11 
through 60 when the groups compared were 
matched on digit span. 
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This study similarly found gross dif- 
ferences in the S system of the mentally 
retarded, especially as compared with the 
chronological control. But, whereas the P 
system of NCA was indistinguishable from 
that of the retarded in the short series and 
was superior at the long series (as was pre- 
dicted) , the relationship in the S system was 
found to be just the reverse. That is to say, 
it was at the shortest series length of Ex- 
periment II that the greatest between-group 
differences occurred. If one can extrapolate 
evidence from Experiment I to the results 
of Experiment II, it would be more accurate 
to say that at the very shortest dichotic 
series (the one-digit pair) the S system of 
retardates funetions about as well as, 
though perhaps slightly below, the level of 
NCA on the average. When, however, the 
dichotic series increases in length so that 
more information has to be stored in the S 
system for longer periods of time, the total 
output (Fig. 2, Scoring I) as well as the pro- 
portional recall falls drastically. As was 
mentioned in Experiment II (above), this 
result is highly indicative of rapid storage 
decay taking place. Such evidence is in 
keeping with the stimulus-decay model of 
Broadbent (1958), and also tends to agree 
with that of Brown (1958). The recall out- 
put of Group NCA, though dropping con- 
sistently, does not fall as rapidly as that of 
the retardates, and thus leads one to con- 
clude that the rate of decay of the memory 
trace in their storage system is slower. 

What of normal children of like MA? 
In the introduction of this paper, it was 
noted that Hermelin and O’Connor (1964), 
using a delayed recall type of task, found 
a faster rate of decay in the STM of im- 
becils (ranging in IQ from 41 to 54) than a 
normal MA control group. The evidence in 
Experiment II suggests that this finding is 
a result of decay chiefly occurring in the 
S system. Though the retarded Ss used in 
this experiment were considerably superior 
in intelligence (ranging in IQ from 53 to 79) 
to those used in the Hermelin and O’Connor 
study, a difference in decay rate was never- 
theless in evidence. Group NMA’s recall 
performance from the second half-span is 
significantly better than that of Group O 
at the two-digit series. Whereas most of 


the imbeciles used by Hermelin and O'Con- 
nor were probably of an organic nature, 
these results would seem to agree with 
theirs, and furthermore, to pinpoint the 
decay to the S system. The fact that such 
statistical significance is not carried over to 
the longer series of this study indicates that 
the deeay rate of the S system in normal 
Ss at this young chronological age is also 
fairly rapid, though not as rapid as that of 
the mentally retarded. 

Thus far we have considered Experiment 
II only in terms of STM capacity. In agree- 
ment with Broadbent and Gregory (1961) 
the above discussion was, for this purpose, 
limited to the data as presented by the first 
scoring procedure. Broadbent would argue 
that when “errors” occur (which, in some 
cases, are what this writer, Bryden [1962], 
and others would consider to be strategies 
of recall other than ear order) one cannot 
clearly tell how much of the report given 
can be attributed to the P system and how 
much to the S system. For this reason, Scor- 
ing I is considered by Broadbent to be the 
more adequate measure of the capacities of 
these systems. Comparing the statistical 
analyses of Scoring II with those of Scoring 
I suggests that the conclusions reached 
above are not far wrong. The relative rela- 
tionships between the groups stay the same 
in both half-spans (compare Tables 3 and 4 
above). However, greater informational 
output of Ss, as determined by Scoring II, 
would seem to indicate that Scoring I 
slightly underestimates the capacities of the 
two systems. This underestimation would 
appear to be fairly constant for the four 
groups, as the difference between Scoring I 
and Scoring II seems to be about the same 
at all series lengths (compare Scoring I with 
Scoring II in Figs. 2, 3, and 4). 

Strategy. What, then, about, group differ- 
ences in strategy of recall? Comparing 
Groups O, F, and NMA on Figure 2 again 
shows that initially these groups operate 
near capacity (in the P system) for the first 
half-span of Scoring I. This would indicate 
that the ear order of recall is in almost ex- 
clusive use at the short series (thus ensuring, 
that the decay found in the S system, as dis- 
cussed above, is not just an artifact of the 
strategy utilized by Ss). There is a slight 
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increase in recall by use of the ear-order 

strategy through the three-digit series and 

then a falling off as the series get longer. 

However, the total number of digits recalled 

continues to climb, using Scoring II, even in 

the longer series. This would seem to sug- 

gest a gradual shifting, by these groups, to 

strategies of recall other than ear order as 

series length increases. 

While the above diseussion is true for 
Groups O, F, and NMA, it is noteworthy 
that Group NCA continues to utilize the 
ear-order technique to a large degree up to 
and including the dichotie series that is four 
digits in length. This group then suddenly 
drops the ear order of recall in favor of 
other recall strategies. The total amount of 
information recalled in these two (four 
versus five digits) series lengths remains 
about the same (though, of course, the pro- 
portion of the number presented which are 
recalled continues to drop), but the amount 
recalled by the ear-order strategy showed 
a marked decline. Why this sudden change 
oceurs is not easily answered. As noted 
above, this change from ear order to alter- 
nate strategies occurs in Groups O, F, and 
NMA, as well as NCA. However, in these 
groups, the shift was gradual and occurred 
earlier in the series. One might conjecture 
that as the STM system becomes increas- 
ingly overloaded the Ss cease to use the 
system that served them best in the past 
and grasp at any system available to them. 
It is, in other words, a shift from an active, 
organizing strategy of recall to a more or 
less passive one. One might almost think of 
this shift as something of a “panic” syn- 
drome, though, of course, no overt panic 
was manifested by the Ss except the oc- 
casional remark to the effect that, “Boy, 
this is getting too hard.” That the ear order 
of recall is both a rational and strategic 
order of recall is indicated by the fact that 
as long as it is retained by the Ss (of all 
four groups), the climb in total number of 
items recalled remained fairly steep with 
increase in series length—until that point 
where other recall strategies came into use. 
At this point the upward trend is flattened, 
or drops (see both Scorings I and II; on the 
first half-span in Fig. 2 particularly). Al- 
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though true for all four groups, this change 
is most evident for Group NCA. 

To summarize, the results of Experiments 
I and II have revealed that the effective 
storage capacity of the STM of retardates 
is much smaller than that of normals of the 
same chronological age. This is true both 
with regard to the P and the S system. As 
compared with normals the same MA, how- 
ever, retardates are remarkably similar in 
P-system capacity, but differ somewhat 
(though not too much) in the S system. 
The relative differences in P capacity of 
Groups O, F, and NMA, as compared with 
NCA, are probably largely a problem of 
encoding strategy used, although, as indi- 
cated below, an inherently smaller capacity 
cannot be ruled out. In terms of S-system 
capacity it would seem that there is a real 
difference between these groups in terms of 
speed with which information in this sys- 
tem will decay—perhaps a maturational 
factor. 

As was noted above, relatively less use 
was made of the ear order of recall by 
Groups O, F, and NMA than by Group 
NCA, except for the two-digit series. The 
third experiment was conducted to see 
whether this situation could be altered by 
varying the rates of presentation of the 
stimulus material. The expectation, derived 
from evidence presented by Broadbent 
(1954) and Bryden (1962), was that at 
very rapid presentation rates (i.e., one digit 
per quarter second) the normal Ss would use 
the ear order of recall amost exclusively, 
but that at slow rates (ie., one pair every 
2 seconds) this particular strategy would 
be used very little. As Figure 6 shows, this 
effect did occur in both normal groups, 
though most markedly in Group NCA. The 
retardates, on the other hand, were charac- 
terized by a more fixed technique of recall, 
with Group O manifesting this trait to a 
more marked degree than Group F. Group 
O showed no statistical change in strategy 
from rate to rate. Group F did demonstrate 
a small but significant change when com- 
paring Difference scores from the fastest 
with those of the slowest speeds of presen- 
tation, thus placing it intermediate in posi- 
tion between Groups O and NMA. What 
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this evidence tells us is not so much that 
normals will change their strategy of recall 
with changes in rate at which information is 
presented; rather, that normals are flexible 
enough, when handling information, to 
search for and utilize better strategies when 
their previous ones have broken down. The 
fact that Group NMA does not manifest as 
marked a shift as that of NCA suggests 
that this procedure is a matter of learning 
and practice, although, perhaps, a minimal 
STM capacity must first be available. The 
two retarded groups tend to exhibit a very 
limited amount of such flexibility, suggest- 
ing that they tend to assume one strategy 
and “hang on to it,” regardless of whether 
or not it is strategic to do so. 

Experiment IVb was similarly carried out 
to study strategy of recall in retardates, but 
was designed to test the ease with which 
they could adopt a given strategy, as well 
as contrasting the search hypothesis of 
Yntema and Trask (1963) with Broadbent’s 
(1958) attention hypothesis. According to 
the search hypothesis it should be no more 
difficult, and perhaps somewhat easier to 
recall dichotic information in terms of types 
of information heard (ie., digits and let- 
ters) than to report the same information 
following the ear order of recall. The results 
revealed that the retardates found it much 
easier to recall the information in terms of 
Types rather than to recall all of the infor- 
mation heard in one ear before that heard 
in the other (Sides). The normals (both 
NCA and NMA), on the other hand, found 
these two strategies of recall to be about 
equal in effectiveness. Both of these results 
Would be in keeping with the search hypoth- 
esis. 

One might query why the retarded should 
find it easier to recall by Types than by 
Sides. One readily available explanation 
would seem to be that recalling the informa- 
tion in the Sides condition is too ambiguous 
for retardates. That is to say, in the dichotic 
situation, especially when the information 
is fed in via headphones, it is relatively easy 
to lose track of what information belongs 
to which ear. For the retardate to keep such 
information distinct seems to be a difficult 
strategy to follow. The Types condition, 


however, presents recall situation in which 
the items to be distinguished are clearly 
tagged. Thus, it is this strategy that the re- 
tardates (both Group O and Group F) find 
easiest to handle. Normal Ss, on the other 
hand, find the Sides condition no more diffi- 
cult than that of Types of material. What is 
a difficult strategy for the retardates can be 
handled relatively well by both normal 
groups. 

Group differences. The differences dis- 
eussed above could presumably be due 
either to learned and/or innate factors. 
With Group F, a good guess would be that 
many of these differences could be at least 
partly attributed to learning factors. Their 
group name of "cultural-familial" suggests 
this to be the case. Presumably, children 
coming from an inadequate cultural en- 
vironment have not had the opportunity to 
learn the best of encoding strategies or have 
not had sufficient, experience to allow them 
to evaluate the relative efficiency of the 
various strategies. If so, it may be that de- 
liberate training in the use of relatively 
superior encoding strategies might at least 
minimize differences between a group such 
as this and their fellow age-mates. Such spe- 
cific training would probably have to be 
begun fairly early, though perhaps early 
school age (such as the NMA Ss here) might 
be adequate. The suggestion of early school 
age is based on the observation that the de- 
velopment of immediate memory in the 
NMA Ss used in these experiments was not 
too far advanced, as compared to that of 
normal Ss of about 10 or 11 years of age (cf. 
Inglis & Caird, 1963). The determination of 
what the more sophisticated encoding strat- 
egies for the scholastic situation might be 
must await considerable future research. 

‘As has been seen above, Group F is more 
like NMA than Group O is. It would seem 
that Group O suffers from some additional 
handicap. For them, one cannot be quite so 
certain that theirs is a problem of learning 
how to adequately encode information. 
Rather it might seem, again as their group 
name (organic) implies, that they may well 
be deficient in both P and S systems, and 
that training in strategies would have a 
lesser effect than with, say, the cultural- 
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familial type of retardate. Whether or not 
this is the case also remains for future re- 
search, probably at least in part of a phys- 
iological nature. Some evidence for such an 
hypothesis has been advanced by Ellis 
(1963), who considers retardation to be 
largely due to stimulus-trace deficits, a po- 
sition not altogether at odds with the one 
taken here. 

The differences between Groups NMA 
and NCA are revealing as to how much 
change occurs between their two respective 
CAs, for in terms of IQ these two groups 
were essentially the same. Inglis and Caird 
(1963) found that the only difference be- 
tween normals over the age of 11 was es- 
sentially in the S system, and, such changes 
as were in evidence were so only as a trend 
over a considerable range of ages. Here, 
however, we found that normals changed a 
good deal with respect to the apparent ca- 
pacities of both the S and the P systems, as 
well as in the ability to utilize the best 
strategies available, and furthermore, in 
their flexibility in doing so. There would 
seem to be both a maturational and a learn- 
ing effect here. The differences between 
these two groups in terms of rate of memory 
storage decay, for instance, might well be 
maturational in nature, as perhaps is the 
ability to tolerate large amounts of infor- 
mation and still retain the best strategy. 
On the other hand such factors as flexibility 
in the change to more adequate encoding 
strategies and the search for such strategies 
might well come about with practice and 
experience. 

On the surface it seems somewhat sur- 
prising that the retardates in general, and 
Group F in particular, did as well as they 
did as compared to Ss in the MA control 
group. Casual experience with both normal 
and retarded children, of, say, MA 9, would 
suggest quite marked differences in per- 
formance. The normal child appears to be 
more intelligent in general behavior (de- 
spite the equivalent MA) and certainly 
more adaptable to diverse environmental 
situations. Furthermore, evidence presented 
by Jensen (1965) would corroborate such 
expectations. Perhaps, though, an answer is 
available in the results of this investigation. 
Experiment II showed that these groups (O, 
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F, and NMA) had essentially the same P 
capacity and did not differ much in S ca- 
pacity. Experiments III and IV, however, 
revealed that normal Ss are much more 
flexible in their use of strategies for infor- 
mation coding and, furthermore, are able 
to tolerate strategies (such as the Sides con- 
dition, above) more ambiguous in nature. 
The results are, then, perhaps not so sur- 
prising after all. The matching of Groups O 
and F with NMA in terms of MA and digit 
span suggests that, in terms of potential, 
these groups are about the same. Evidence 
from Experiment II corroborates that sug- 
gestion. A casual comparison of the overt 
behavior of these three groups, however, 
tells us that if their potential is the same, 
they certainly don’t seem to be making the 
same use of it. Results obtained in Experi- 
ments III and IV would tend to corroborate 
that observation. In terms of STM, at least, 
these groups seem to have about the same 
potential, but their use of that potential 
does differ. 

In conclusion, the suggestion that retard- 
ates preform so poorly as compared with 
normals largely because of a limited STM 
capacity and the inefficient use of encoding 
strategies would seem to have gained con- 
siderable supportive evidence. The studies 
discussed above demonstrate not only that 
the effective capacity of mental retardates 
is smaller than in normals, but also that 
this limitation may be due, in large part at 
least, to a lack of flexibility, and hence in- 
efficient strategy usage on the part of the 
retardate. 


SUMMARY 


A series of experiments was conducted to 
investigate STM in mental retardates. The 
primary purpose of these experiments was 
to discern whether or not STM capacity 
and/or strategy of encoding information 
could account for some of the differences 
between retardates and normals. This inves- 
tigation was carried out with the dichotic 
listening technique is initiated by Broad- 
bent (1958). 3 

An initial study found the usage of this 
technique to be feasible with retardates. 
This was followed by three major experi- 
ments with four groups of 15 Ss each. The 
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groups were as follows: two groups of re- 
tardates, one organic (Group O) and one 
cultural-familial (Group F) in nature, 
matched in mental age and digit span with 
a group of normal controls (Group NMA). 
The fourth group, matched in CA with the 
two mentally retarded groups, served as a 
second normal control (Group NCA). 

In the first experiment dichotic series of 
2, 3, 4, and 5 pairs of numbers were pre- 
sented to the Ss at the rate of one pair 
every half second, This experiment demon- 
strated that the effective STM capacity of 
both retarded groups is much less than that 
of a comparable CA control, but does not 
differ greatly from Group NMA. The evi- 
dence also indicated that the retardates 
were subject to a faster rate of information 
decay in that part of immediate memory 
which has been termed S system by Broad- 
bent, and is tapped by the second half-set 
of digits recalled. Comparing the data from 
the two scoring procedures used, further- 
more, suggested that as information-load 
increased with length of series, Ss tended 
to change in strategy from recalling the 
digits ear by ear (ear order), to other types 
of strategies generally less efficient at the 
rate of presentation used here. Such a 
change occurred later in the series for 
Group NCA, than for Groups O, F, or 
NMA, indicating a greater tolerance for a 
large information load. This shift appeared 
to be a change from an “actively organiz- 
ing” to a “passive” type of recall strategy. 

The second experiment held the length 
of dichotic series constant at three pairs of 
numbers, but varied the rate of presentation 
as follows: one pair per quarter second, one 
pair per half second, one pair per second, 
and one pair per 2 seconds. This experiment 
demonstrated a marked degree of flexibil- 
ity by the normals (both NMA and NCA) 
in their adaptation of different strategies 
of recall to the various rates of informa- 
tional input. Such flexibility was not found 
in the retardates, At rapid rates of presen- 
tation normal Ss tended to report the num- 
bers from one ear followed by numbers from 
the other (as in the first experiment). As 


the rate slowed, the frequency and accuracy 
of this order of report decreased while the 
frequency and accuracy of reporting the 
material in other orders, such as the order 
the information arrived at the ears, in- 
creased. Such a shift was only partly in evi- 
dence in Group F, and not at all in Group 


0. 

The final experiment similarly tested the 
immediate recall of series six items in length 
presented two at a time (one to each ear) 
but held the rate of presentation constant 
at one pair per half second. In this study, 
however, each pair of items presented to- 
gether consisted of a letter of the alphabet 
and a digit, and the side on which the letter 
was presented varied haphazardly from 
pair to pair. For the retarded Ss (both 
Groups O and F) recall was more successful 
when S was instructed to recall the items of 
one type and then the items of the other 
type than when instructed to report the 
items heard on one side and then those 
heard on the other. Normal Ss (NCA and 
NMA) recalled equally well in both con- 
ditions. The conclusion was that, though 
normals could handle each type of recall 
strategy equally well, retardates had more 
difficulty with the greater inherent ambig- 
uity involved in recalling information by 
sides of the head than by types of material. 

In conclusion, the evidence indicated that 
STM capacity was indeed an important 
difference between retardates and Group 
NCA. This deficit in apparent capacity, 
however, was probably enhanced by the re- 
tardates’ lack of flexibility in the search 
for and use of appropriate recall strategies 
and their manifestation of difficulty with 
ambiguous types of strategies. Though ca- 
pacity was essentially the same for Groups 
O, F, and NMA, the two retarded groups 
also fell below NMA Ss in their ability to 
adopt a flexible mode of behavior, and to 
utilize more ambiguous strategies. The dif- 
ferences between Groups NMA and NCA, 
on the other hand, were indicative of the 
degree to which both memorie capacity and 
ability to make use of useful strategies de- 
velops in normal individuals over time. 
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APPENDIX 
TABLE A1 
ANALYSES OF VARIANCE ON EacH Hatr-Span or Data, EXPERIMENT I 
Groups analyzed 
Source of Variation OXFXNCA OXFXNMA 
df MS F MS F 
First half-span 
Between Ss 
Groups (G) 2 35.95 8.23** 6.03 1.46 
Error (b) 42 4.36 4.13 
Within Ss 
Series 
Length (L) 3 127.65 81.64*** 147.09 122.82*** 
LxG 6 1.99 1.28 AT n.s. 
(L X S)w 126 1.56 1.20 
Second half-span 
Between Ss 
Groups (G) 2 74.19 8.57*** 10.74 2.04 
Error (b) 42 8.66 5.26 
Within Ss 
Series 
Length (L) 3 119.70 45,26*** 97.27 32.31*** 
LxG 6 9.18 3.47** 3.38 1.12 
(L X S)w 126 2.64 3.01 
"y <-i. 
001. 
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A REINFORCEMENT ANALYSIS OF GROUP PERFORMANCE? 
ROBERT GLASER aw» DAVID J. KLAUS? 
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2 studies investigated response feedback and reinforcement contingencies 
occurring in a “team environment.” Study I investigated 3-man series teams 
under conditions of response acquisition, extinction, spontaneous recovery, 
reacquisition and reextinction. Feedback to team members was based solely 
on group output. The results suggest team performance can be manipulated 
using methods which effectively control the behavior of individual or- 
ganisms. Study II investigated 3-man parallel teams in which a reinforced 
team response could occur as a function of correct responding by only part 
of the team. With continued reinforced practice, performance degraded to a 
level equal to or below initial team performance. These findings are analyzed 
in terms of an operant conditioning model of team performance. 


dq article describes an approach to 
i the experimental study of the condi- 
tions of training that influence the acquisi- 
tion and decay of group performance. The 
general viewpoint taken is one in which 
group behavior is studied in the same man- 
ner in which individual behavior has been 
investigated successfully in the past. How- 
ever, it is the behavior of the team, rather 
than the behavior of its individual mem- 
bers, which is the primary unit of investi- 
gation. The approach follows from experi- 
mental work which emphasizes the feedback 
and reinforcement contingencies that are 
produced as a function of the “group en- 
vironment” (Glanzer & Glaser, 1961; Klaus 
& Glaser, 1960). These contingencies are, 
in turn,-a function of (a) the probability 
that appropriate responses will be made by 
group members and (b) the manner in 
Which these individual responses are con- 
verted into a collective response. 

The kind of group considered here is 
called a “team.” In contrast to a small 


*These studies were carried out in the Team 
Training Laboratory at the American Institutes 
for Research as part of an ongoing research pro- 
gram, Increasing Team Proficiency Through Train- 
ing, supported by the Office of Naval Research 
under contract Nonr 2551(00). Reproduction in 
whole or in part is permitted for any purpose of 
the United States Government, 

. ° The authors wish to acknowledge the contribu- 
tions of Karl Egerman in conducting Study II and 
of Jerry Short in reviewing the manuseript. 


group, a team is highly structured and has 
relatively formal operating procedures, for 
example, a submarine crew or a baseball 
team. A small group, on the other hand, is 
less formal and has few specialized individ- 
ual tasks, for example, a jury or a commit- 
tee. 

The general characteristics of the analy- 
sis used can be illustrated by the example 
of a two-man team in which a “monitor” 
obtains information and transmits it to an 
"operator" who processes the information 
and transmits the team output. In its sim- 
plest form, this output results in a binary 
contingency, that is, right or wrong, a hit 
or a miss. The team is arranged in series 
since both component members must exe- 
cute a correct response in order for the 
team to produce a correct response. If the 
performance of each member is followed by 
reinforcement only for a correct team prod- 
uct, several predictions can be made about 
the likelihood of the occurrence of correct 
individual and team responses under vari- 
ous conditions. When both men are correct, 
the team response will be followed by rein- 
forcement, and there will be an increase in 
the probability of correct individual re- 
sponses. When both men are incorrect, no 
reinforcement will be forthcoming to either 
member, and the probability of their in- 
correct responses will be decreased. When 
one member responds appropriately and the 
other does not, the subsequent withholding 
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of reinforcement will result in an extinction 
trial for the member responding correctly. 

In a team situation, reinforcing events 
which are contingent upon the team re- 
sponse follow the preceding responses of all 
team members “indiscriminately,” that is, 
every team member is exposed to the same 
event. For example, in the series-linked 
team just described, if one member re- 
sponds incorrectly, no reinforeing feedback 
is presented to the other member even 
though he made a correct response. This 
“confounding” è characteristic of team re- 
inforcement suggests one way of defining 
a team, that is, a group of individuals who 
are all reinforced by a single event, the oc- 
currence of which depends on the integrated 
responding of at least some of the partici- 
pating members on any one trial. The em- 
phasis in this concept is that group feed- 
back, the reinforcing event, is contingent 
upon a composite of individual perform- 
ance. 

The major purpose of the studies de- 
scribed in this paper is to assess the feasi- 
bility of considering the team as a learning 
unit which reacts to the presence or absence 
of reinforcement following a response as do 
individual organisms observed in a learning 
laboratory. Accordingly, when the team 
product is considered as the response, it 
should exhibit increments or decrements as 
a function of the properties of the stimulus 
contingencies following each trial. For ex- 
ample, the team should acquire proficiency 
in responding when feedback to the team 
is reinforcing and, once acquired, extinction 
of the team response should occur if rein- 
forcing feedback is withheld. Study I was 
designed to test this hypothesis by deter- 
mining the influence of the presence and 
absence of group reinforcement on the per- 
formance of a series team. Study II con- 
sidered the more complicated case of a 
team linked in parallel. In a parallel team 
a correct response by either one or more 
members can produce a correct team re- 
sponse. If team learning can be described 
in terms of the same principles used to de- 
scribe individual learning, the study of in- 


®To use the term suggested by Rosenberg and 
Hall (1958). 


dividual behavior and group behavior 
fruitfully can share common theoretical 
structures, similar experimental techniques, 
and mutual problems. 


Methodological Perspective 


In studying the performance of a team, 
one level of analysis is to observe the team 
as a responding entity. From this point of 
view, one looks at the stimuli that impinge 
upon the group and observes the properties 
of the group responses that occur. It is as 
if the group were considered an “empty 
organism" and stimulus-response relation- 
ships were observed for study without con- 
sidering internal mechanisms. The study of 
team performance on this level can be 
called “molar” in the sense that the re- 
sponse class under consideration is a group 
product and not the separate responses of 
the individual group members. From a 
molar point of view, group responses can 
be studied as a function of environmental 
contingencies without reference to the indi- 
viduals comprising the team. In contrast, 
it also would be possible to look at the re- 
sponses of individual members from a “mo- 
lecular” point of view. The study of team 
performance on this level would emphasize 
individual behavior as it is influenced by 
the team environment. 

Possible relationships that can be studied 
on each level of investigation and across 
levels are illustrated in Figure 1. The dia- 
gram shows two three-man teams. In Team 
A, the sequence of response events must 
occur in much the same way as events in a 
simple series circuit. In Team B, response 
events also must occur in series, that is, 
each man must perform correctly to com- 
plete the team task but, in this case, two 
team members’ responses serve as the stim- 
ulus inputs to a third member. In Figure 1, 
the capital letters refer to group stimuli 
and group responses. The small letters re- 
fer to stimuli and responses with respect to 
individual group members. In Team A, 
refers to the stimulus event which initiates 
group activity. S can be considered as an 
external stimulus, that is, coming from out- 
side the group’s immediate environment. 
is the group response which is a function 
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of the performances of the members of the 
team. Sf is the environmental consequence 
brought about as a result of the team re- 
sponse. Sf can act as group feedback if it 
follows the group response. The ovals in 
Figure 1 refer to individual team members. 
sı refers to the stimulus input for team 
member 1, r, is his response, and s;f is the 
feedback to him or the consequence of his 
response. Sf and sf may be different events 
depending upon the man-machine team ar- 
rangement, the remoteness of the individual 
team member from the occurrence of S*, 
and also his opportunity to observe it. The 
notations in Team B have the same mean- 
ings except that S, and S indicate that two 
independent environmental inputs are fed 
into this team. (S, and sı and Sp and sz may 
or may not be identical events depending 
upon the nature of the team task and the 
construction of the communication arrange- 
ment.) 

Legitimate variables for study are any of 
the stimulus and response relationships in 
Figure 1, For example: the relationship be- 
tween group input S, group response R, and 
group feedback Sf; team member response, 
Ti, 72,...T,, as a function of group feed- 
back S*; the relationship between individ- 
ual feedback sf and group response R; the 
relationship between individual feedback 
s and individual response r; , T2,.--Tn- AS 
has been indicated, the relationship between 
the team member response r and group 
feedback St is especially interesting in situ- 
ations where the feedback to an individual 
is not the consequence of his own response 
but, rather, the consequence of his response 
as confounded with the responses of other 
group members. 

. The two studies described in this paper 
illustrate applications of this approach in 
the study of team performance and team 
learning, Study I emphasizes the relation- 
ship between group response R and group 
feedback S£, when Sf is considered as a re- 
inforeing stimulus. In this first study, the 
primary concerns are the observable team 
output and team feedback events that oc- 
cur outside the boxes in Figure 1, and the 
determination of orderly functional rela- 
tionships between these events. Study II 


Fic. 1. Reinforcement analysis of series teams. 


considers the effect of group feedback Sf 
on individual team member responses n, 
T2,...In and the subsequent effect on the 
overall group response R. This study ex- 
amines a predicted change in R as a func- 
tion of the learning environment produced 
when the arrangement of team members 
permits group reinforcement $! following 
inappropriate individual responses. The 
general purpose of these investigations is 
to determine the extent to which team 
learning analyzed at the molar level evi- 
dences relationships similar to those that 
have been identified in studies of individual 
or single organism learning. 


Srupy I: ACQUISITION AND EXTINCTION 
OF A TAM RESPONSE 


The initial study was designed to con- 
sider the team as a unit which responds to 
the presence or absence of reinforcing 
events by exhibiting increments or decre- 
ments in team performance. The specific 
hypotheses were that a team response will 
be learned when group feedback to the team 
is reinforcing and that extinction of this 
team response will occur if reinforcing feed- 
back is no longer forthcoming. 


Procedure 

The units of investigation were three-man 
teams in which each member was assigned a spe- 
cific task. Team members were organized in a 
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series arrangement so that all members were re- 
quired to perform correctly in order to complete 
the team task. The task situation was constructed 
so that no member received any feedback about 
the accuracy of his own or any other member's 
performance until the entire team completed the 
task. When all three members performed cor- 
rectly, the team as a group received knowledge 
of a successful trial (S*). 


Subjects 


Male high school students were employed as 
team members. All subjects (Ss) were at least 16 
years old and were selected so that none was in 
a slow-learning academic group. The Ss were 
paid one dollar an hour. 


Apparatus 


Each S was seated in front of a panel on which 
there was a row of three horizontal lights used to 
present stimulus displays. Below these lights, the 
panel contained a counter at S’s right and a small 
red light to his left. A momentary-contact toggle 
switch by which S made a response was centered 
at the bottom of the panel. From his position at 
his panel each S could see a large counter on the 
wall, but could not see the other two Ss, 

The response required was the estimation of a 
time interval, accomplished by either a 2-second 
or 4-second press of the toggle switch. Tolerance 
for the 2-second press was +.183 second and for 
the 4-second press, +.258 second, roughly equating 
the difficulty of the two responses. The S made a 
correct response if he released the switch within 
the allowable tolerances. The Ss wore earphones 
throughout the course of the experiment, and 
musical broadcasts were played to help mask any 
stray apparatus noises which might have served as 
response cues. 


A central control apparatus was located in an 
adjoining room. A 20-pen operations recorder and 
seven associated five-digit counters recorded 
stimulus presentations, responses and reinforce- 
ments to each S, and reinforcements to the team 
as a whole. Twelve clutch-operated adjustable- 
duration clocks timed the responses and allow- 
able tolerances for each S's press of the toggle 
switch. 


Method 


Six teams of three Ss each were trained 1% 
hours per day (excluding Saturdays, Sundays, 
and holidays) for a median of 31.5 days (range 
21 to 72 days). The experiment consisted of three 
phases of training: (a) Individual Training, (b) 
Stimulus Pattern Training, and (c) Team Train- 
ing. These phases, outlined in Table 1, are de- 
scribed in detail below. 

Individual training. The Ss were randomly as- 
signed to one of the three stations and were 
called, respectively, Monitor One (Mı), Monitor 
Two (Ms), and operator (Op). Each S was in- 
structed that when he depressed his switch and 
released it at the proper interval, a point would 
register on his counter and the adjoining red light 
would flash for about 1 second. Each daily session 
during this phase consisted of two half-hour pe- 
riods during which Ss practiced each press at 
their own rate, alternating between the 2-second 
and 4-second presses every 5 minutes. Daily prac- 
tice sessions were continued until all three mem- 
bers maintained a proficiency level of .63 or higher 
for four consecutive 5-minute periods. Proficiency 
level was caleulated by dividing the number of 
correct presses by the number of total presses for 
each 5-minute period. After the third member of 
the team had reached this criterion, four extra 
5-minute periods of alternate 2- and 4-second 


TABLE 1 
DESCRIPTION AND CRITERIA OF THE TRAINING PHASES IN STUDY I 


"Training phase Description 


Criterion 


Individual Ss practiced individually on 2- and 4- Four consecutive 5-minute periods 
second time estimations. of 63% proficiency or better. 
Pattern Monitors practiced timing responses None 
to light patterns. 
Three-man team A. Ten minutes of individual warm- None 


up. 
B. All three Ss performed in a joint 


effort. 


1. Acquisition: four consecutive pe- 
riods of 10 or more team points. 
2. Extinetion: four consecutive pe- 

riods of zero team points. 
3. Spontaneous recovery: two con- 


secutive periods of zero team 
points. i 

4. Reacquisition: (same as acquisi- 
tion). 


5. Reextinetion: (same as extinc- 
tion). 
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presses were given in order to provide this last 
team member with additional practice. The me- 
dian number of 5-minute periods required for all 
Ss to reach criterion was 35 (2.9 hours). The range 
was 12 periods to 95 periods (1.0 to 7.8 hours), 
spread over one to eight daily sessions.* 

Stimulus pattern training. In this phase the 
monitors learned to discriminate between four 
light patterns presented on their individual panels. 
Two of the patterns were used to indicate that 
the monitors should try a 2-second press; the 
other two patterns were used to indicate a 4-sec- 
ond press should be attempted. Later, during team 
.training, one of the four light patterns signaled 
the start of a team trial and indicated whether 
the monitors should make a 2-second or 4-second 
response on that particular trial. The pattern dis- 
crimination was a simple task, and the monitors 
learned to perform it adequately in about 10 min- 
utes of self-paced practice during which they re- 
sponded to 30 to 40 patterns. The sequence of 
pattern presentation was random except that no 
identical patterns followed in succession. Since 
the operator did not have to respond to these 
stimulus patterns in the team training phase, he 
did not participate in stimulus pattern training. 

Team iraining. On the first day of the team 
iraining phase, Ss were instructed that all three 
of them would use their timing skills in working 
as à three-man team. The team arrangement used 
in this study is illustrated schematically by Team 
B in Figure 1. They were told that when all three 
performed correctly a point would register on the 
wall counter and a bell would ring. On each trial, 
the two monitors were presented with one of the 
four Stimulus patterns they had responded to dur- 
ing pattern training. They were instructed that 
they would not receive any immediate indication 
about the accuracy of their own responses (the 
individual panel counter and adjoining red light 
were made inoperable during team training). The 
operator’s panel displayed two light signals each 
indicating the duration of a monitor’s press. If 
the operator judged both presses to be accurately 
executed, he was instructed to make a 4-second 
press, 

If all three team members performed correctly 
—if the monitors had made an appropriate and 
accurate response and if the operator had made 
an accurate 4-second press—a point on the wall 
counter registered, the bell rang, and a new 
stimulus pattern automatically was presented to 
the monitors. All other response combinations 
were counted as incorrect team responses. When 
one or both monitors performed incorrectly, the 
Operator's task was to change the stimulus pattern 
and have the team attempt another point by 
ee 


‘When an S did not reach criterion after 100 5- 
minute periods, the entire team was dismissed and 
a new three-man team was obtained. During the 
course of the experiment four teams were dismissed 
for failure of one or more of the members to reach 
individual performance criterion. 


making a 2-second press. If the operator's re- 
sponse was correct, a new stimulus pattern was 
presented to the monitors, commencing another 
trial. If the operator performed incorrectly, no 
change in any of the stimulus features occurred. 
When this happened, the operator was instructed 
to respond again, either with a 2-second or a 
4-second press, whichever he judged to be appro- 
priate. As in the case of each monitor, individual 
feedback was eliminated for the operator. (The 
extra complication of the operator's task, i.e., judg- 
ing the monitor’s correctness, turned out to be 
unnecessary and has been omitted whenever ap- 
propriate in subsequent data analyses.) 

The first 10 minutes of each day during the 
team training phase was devoted to individual 
practice as in the individual training phase. On al- 
ternate days, the Ss began either with 5 minutes of 
practice on the 2-second or 4-second press. During 
team training, two rest periods were inserted in 
the daily 14-hour sessions, one rest period after 
the first 25 minutes of team training and one 
after the next 30 minutes. 


Team Training Conditions 


This first experiment followed an operant con- 
ditioning paradigm. In particular, five learning 
phenomena were selected for detailed examina- 
tion. As indicated in Table 1, these were: (a) 
team response acquisition, (b) team response ex- 
tinction, (c) spontaneous recovery, (d) reac- 
quisition, and (e) reextinction. 

Team response acquisition. During team train- 
ing all three team members had to perform cor- 
rectly on any one trial in order for a point to 
register on the wall counter. The Ss were trained 
to a criterion of 10 or more of these points in 
each of four consecutive 5-minute periods. When 
the criterion was attained, four additional 5-min- 
ute training trials were conducted. Acquisition 
time for the six teams studied varied consider- 
ably. The median number of 5-minute periods 
to reach criterion for all teams was 53.5 (45 
hours); the range was from 14 to 402 5-minute 
periods (1.2 to 33.5 hours). 

Team response extinction. Following acquisi- 
tion to criterion level, the wall counter and bell 
were made inoperable so that no feedback oc- 
curred even though the team continued to per- 
form. A criterion level for extinction was set at 
zero correct team responses in each of four con- 
secutive 5-minute periods. The median number 
of 5-minute periods for the six teams to reach 
this criterion was 56.5 (4.7 hours); the range was 
from 17 to 157 5-minute periods (14 to 13.1 
hours). Immediately after a team reached this 
criterion, it was dismissed from the laboratory 
for the day. Ss were instructed to return on the 
next working day.* 


5 At least once during extinction training almost 
every team remarked to the experimenter that the 
apparatus must be broken. When this occurred 


Spontaneous recovery. Upon returning to the 
laboratory, and after completing the usual 10- 
minute warm-up practice on the 2- and 4-second 
presses, the team again was subjected to extinc- 
tion. Whether or not the expected increase in 
response frequency was observed, training was 
continued to a criterion of zero correct responses 
in each of two consecutive 5-minute periods. The 
median number of periods to reach this criterion 
was 10.5 (0.9 hours); the range was from 3 to 113 
5-minute periods (0.3 to 9.4 hours). 

Reacquisition. When the criterion for spon- 
taneous recovery was met, the wall counter and 
the bell were made operable, and the team's cor- 
rect responses again produced points. The cri- 
terion for this phase was identical to the acquisi- 
tion phase and, as in acquisition, four additional 
practice periods were conducted. The median 
number of 5-minute periods to achieve criterion 
in this condition was 24.5 (2.0 hours); the range 
was from 4 to 37 5-minute periods (0.3 to 3.1 
hours). 

Reeztinction. Following reacquisition, a second 
extinction phase was carried out. The criterion 
for this phase was identical to that used during 
the extinetion phase, four consecutive 5-minute 
periods each with zero correct responses. The 
median number of 5-minute periods required for 
reextinction in five of the six teams was 81 (6.8 
hours); the range was from 20 to 200 5-minute 
periods (1.7 to 16.7 hours)’ 


The Change in Environment between Indi- 
vidual and Team Training 


When Ss were combined into three-man teams, 
a number of changes in their performance en- 
vironment occurred. The most significant change 
was in reinforcement contingencies and ratios. 
During individual training, each S was on a con- 
tinuous schedule of reinforcement, receiving feed- 
back from the indicators on his own panel. During 
the team phase, when feedback was changed to 
the wall counter, Ss were switched to an aperiodic 
schedule of reinforcement. Thus, on some occa- 
sions a team member might perform correctly 
but, because of the incorrect performance of an- 
other team member, the team did not receive a 
point. This constituted an extinction trial for the 
team member who performed correctly. For ex- 
ample, all members of a team might have an in- 
dividual probability of correct performance of 
.63 (the minimum performance required of all Ss 


the experimenter changed switch positions on the 
central apparatus without being observed, went 
into the laboratory, and secured a team point by 
making the appropriate response at each panel. 
The Ss apparently were reassured that the ap- 
paratus was not malfunctioning. 

* The one team which did not exhibit reextine- 
tion was continued in this condition for a total of 
280 5-minute periods (23.3 hours) before being 
dismissed from the laboratory. 
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before being introduced into the team. setting), 
Since all team members were required to perform 
correctly on any one trial for a reinforcing event 
to occur, the probability of a correct team re- 
sponse was at least (.63)(.63)(.63) = 25. An S 
who received individual reinforcement for 100% 
of his correct responses during individual training 
now received reinforcement only for about 40 
per cent of his correct responses under team con- 
ditions. The probability of his receiving a team 
reinforcement when he made a correct individual 
response depended on the joint probability that 
the other two team members also were correct on 
that trial, which would occur (.63)(.63) = .40, or 
40% of the time. In general, each individual's per- 
cent of reinforcement in a series team is a function 
of the proficiency level of the other team members. 


Results: Molar Analysis 


Figure 2 shows an individual and a team 
cumulative performance curve. The dashed 
line is the performance curve for one mem- 
ber of a team during individual response 
acquisition training, During the last por- 
tion of his training, this S averaged at least 
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55 correct responses in each 5-minute pe- 
riod. This represents a proficiency level of 
about 75% correct responses. The solid line 
shows a team response acquisition curve for 
an early pilot team. This curve is quite 
similar to the dashed line. However, the 
rate of responding is less than that shown 
in the individual curve. Both curves, al- 
though representing different rates, show 
highly similar trends in overall shape and 
acceleration suggesting that the perform- 
ance changes that occur during individual 
and team learning are comparable. 


Team Performance Curves 


The major results of Study I are pre- 
sented in the team performance curves 
shown in Figure 3, These curves are plotted 
cumulatively, in terms of the number of 
correct team responses per 5-minute period, 
through the five training conditions. The 
number of 5-minute periods required for 
the teams to reach criterion under each of 
the experimental conditions is given in 
Table 2. 

Response acquisition. All six teams ex- 
hibited learning of the team response dur- 
ing initial response acquisition. For all 
teams, there was an increase in the number 
of correct responses indicating that the re- 
quired task was being learned. On the 
average, the mean number of correct re- 
sponses per 5-minute period increased from 
2.9 in the first third of the initial acquisi- 
tion period to 5.7 in the second third and 
to 8.9 in the last third. There was consider- 
able variability in the shape of the curves 
and in the time required to reach criterion. 
The most rapid (Team 5) required only 18 
5-minute periods (1.5 hours), while the 
slowest (Team 4) required 406 periods 
(33.8 hours). 

Response extinction. During extinction, 
the rate of correct responding for all six 
teams decreased to zero. This is indicated 
by the change in slope of the curves. Again, 
the variability in the time required to 
reach the extinction criterion was quite 
marked among the teams. Team 4 extin- 
guished most rapidly, in 17 5-minute pe- 
riods (1.4 hours), while Team 1 required 
157 periods (13.1 hours). For all teams, 


the extinction of the team response oc- 
curred even though 10 minutes of individ- 
ual warm-up practice under a continuous 
schedule of reinforcement was provided to 
all team members at the beginning of each 
daily session. Apparently, little transfer 
occurred between individual practice under 
a continuous reinforcement schedule and 
team practice on an extinction schedule. 
Under the first condition, proficiency was 
maintained but, in the other, proficiency 
declined over successive trials. 
Spontaneous recovery. All teams ex- 
hibited this phenomenon, although some 
more clearly than others. The small num- 
ber of trials associated with this condition, 
however, does not permit this phenomenon 
to be particularly evident in the curves. 
Teams 2 and 4 each required only three 
5-minute periods (15 minutes) to reach 
criterion for this phase, while Team 3 re- 
quired 113 periods (9.4 hours). 
Reacquisition. Figure 3 shows that, with 
reacquisition training, the teams improved 
in the rate of correct performance; how- 
ever, the shape of the curves is not con- 
sistent for the six teams. Five of the six 
teams, all except, Team 5, reached criterion 
in much less time than they did in initial 
acquisition. This is a typical occurrence 
when single organisms are retrained under 
similar conditions. For comparison pur- 
poses, the number of trials required for 
each team to reach acquisition and reac- 
quisition criteria may be seen in Table 2. 
Reextinction. Finally, in reextinction, 
Team 3 failed to achieve the prescribed 
criterion level. The variability among 
teams in number of trials to reach this cri- 
terion was high. Of the five teams which 
reached criterion, the fastest team, Team 
1, required only 20 5-minute periods (1.7 
hours) and the slowest team, Team 4, re- 
quired 200 periods (16.7 hours). Even after 
280 periods (23.3 hours), Team 3 failed to 
reach criterion. The characteristic single- 
organism phenomenon of more rapid reex- 
tinction than initial extinction was observed 
in only two teams, Teams 1 and 5. — 
The changes in team performance illus- 
trated in Figure 3 also are evident when 
team proficiency, the ratio of the number 
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of correct team responses to the number of 
attempts, is compared at various points 
during the experimental procedure. Table 
3 indicates the proficiency of each team, 


and the median proficiency for all six 
teams, at the beginning of acquisition, the 
end of acquisition, the end of extinction, 
the end of reacquisition, and the end of 
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reextinction. Data on proficiency during 
spontaneous recovery are not included be- 
cause of the very small number of trials 
required by several of the teams. Pro- 
ficiency values for the end of acquisition 
and for reacquisition were obtained on the 
four postcriterion trials in each of these 
phases (see procedure) and do not include 
the criterion trials themselves. Proficiency 
values for extinction and reextinction were 
obtained on the last 12 trials preceding the 
criterion trials; no posteriterion trials were 
presented following these phases. The pro- 
ficiency values representing initial profi- 
ciency in the team setting were taken from 
the first 12 trials of this phase. Although 
there is considerable variability among 
teams during the 5-minute periods just fol- 
lowing or just preceding the criterion trials, 


TABLE 2 


Tue Numper or 5-MiNuTE PERIODS OF TRAINING 
FOR Each TEAM IN Each EXPERIMENTAL 


CONDITION 
Acqui- inc- Spon- Reac- - 
Tem dis Pim taneous quit tinction 

1 83 157 13 31 20 
2 48 24 3 9 91 
3 40 72 113 8 280^ 
4 406 17 3 41 200 
5 18 88 16 26 76 
6 67 41 8 32 122 


* Includes the four periods given after criterion 
was reached. 
^ Failed to reach reextinction criterion. 


the median values given in Table 3 sug- 
gest that the influence of having presented 
or withheld reinforcement is pronounced. 
There is little likelihood that criterion per- 
formance for successive phases was ob- 
tained as an artifact of performance vari- 
ability. Even when the performance of each 
team is considered sequentially, there is a 
uniform pattern of rising and falling pro- 
ficiency as a consequence of the reinforce- 
ment contingencies then in effect. It also 
should be noted that many of the reported 
proficiencies were obtained by combining 
records from 2 adjacent days. 

The data in Figure 3 and Table 3 indi- 
cate that changes in team performance 
parallel predictions which could be made 
based on a knowledge of individual organ- 
ism behavior under similar training condi- 
tions. By controlling the occurrence of re- 
inforcement following each team response, 
it was possible to demonstrate patterns of 
learning phenomenon usually associated 
with individual organism performance. The 
teams showed response acquisition during 
reinforcement, performance decrement dur- 
ing response extinction, some evidence of 
spontaneous recovery, and less time to re- 
acquire the response than to acquire it ini- 
tially. The one dissimilarity between team 
performance and usual individual organism 
performance was that reextinction did not 
necessarily take place more rapidly than 
did the initial extinction of the team re- 
sponse. 


TABLE 3 


Srupy I. AVERAGE PROFICIENCY AT THE END or INDIVIDUAL TRAINING AND DURING 
NoncriTERION TRIALS UNDER VARIOUS EXPERIMENTAL CONDITIONS 


Team acquisition Extuc. (Reel: v ecu 
Individual training Predicted team. 
Team Bc. Deuce Tnitial Final Final Final Final 
M M Op ge Šp Q&xM) 12 Trials 4 Trials 12 Trials 4 Trials 12 Trials 
a 
H 87.5085 E „51 .24 .5l .09 .20 .09 
2. .55 .62 ..80 27 34 .18 AT .01 .29 OL 
8," TOA IOS secre .35 -48 .01 .65 6 54 10 
4 eg 002. SEO .30 42 .00 .85 .09 Ad .04 
5 .69 —.69  .63 .30 .48 .49 .86 a .52 .20 
6:41,A173 POS mAT .35 .50 .04 -16 .07 78 .07 
Mdn. .69 33 .48 .10 .49 .08 49 .08 


~ For all 10 precriterion trials for this team. 
è Did not reach extinction criterion. 
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Results: Molecular Analysis 


A predicted level of performance for each 
team was calculated from proficiency meas- 
ures obtained for each individual team 
member just prior to beginning team train- 
ing. This prediction of team proficiency 
was computed using the multiplicative law 
of probability; it was assumed that the per- 
formance of team members was independ- 
ent and that team performance, when mem- 
bers were arranged in series, was a function 
of the probability that each team member 
would perform correctly on any one trial. 
For example, if all members of a team had 
a proficiency of .63 at the end of the indi- 
vidual phase of training, the predicted team 
proficiency would be (.63)(.63)(.63) = 
.25. For the monitors this measure of pro- 
ficiency was a reasonable estimate; for the 
operator it was an underestimate however, 
since he could respond more than once in 
attempting to produce a correct response. 
This increased opportunity to respond 
brought the operator’s proficiency level 
close to 1.00 in the team setting so that 
team proficiency alternately could be pre- 
dicted by the combined probabilities of the 
two monitors alone. Table 3 shows the in- 
dividual proficiencies for M;, Mz, and Op 
and predicted team proficiencies using both 
the two-man and three-man team esti- 
mates. 

The Predicted team proficiency columns 
of Table 3 contain estimates of each team’s 
performance based upon team member pro- 
ficiency during the last 12 5-minute periods 
of individual training. If these estimates 
are compared to the figures obtained during 
the first 12 5-minute periods of team train- 
ing in the Team acquisition: Initial column, 
it is evident that all but one team per- 
formed considerably lower than predicted. 
It is also evident from Table 3 that neither 
the two-man nor the three-man predictions 
of team proficiency agree with the initial 
or final estimates of team performance dur- 
ing acquisition, or show a relationship with 
the number of trials required by each team 
to reach the acquisition criterion (Table 
2). It might be hypothesized that the per- 
formance of a team when it was first 


formed should reflect the proficiency of its 
members so that a team composed of high- 
proficiency members would perform better 
initially than a team composed of low- 
proficiency members. It also might be as- 
sumed that a team which had high initial 
proficiency would reach criterion in less 
trials than one which had low initial pro- 
ficiency. These hypotheses are central to 
the approach being examined in this paper, 
but the data collected for this first study 
are not appropriate to test either of them. 
Because of the individual training proce- 
dure used, all teams were comprised of 
members who had been roughly equated in 
proficiency so that those differences among 
teams evident in Table 3 largely reflect the 
random variability which prevented all 
teams from being as equal in proficiency as 
had been intended. Thus, no particular 
correlation between predicted and initial 
team proficiency was anticipated. 

On the other hand, there was a substan- 
tial decrease in proficiency for most of the 
teams as they began performing under 
team conditions. This difference between 
observed and predicted team proficiency 
can be analyzed in terms of the differences 
in the schedules of reinforcement in the in- 
dividual and team settings. For the indi- 
vidual phase of training, all Ss were on à 
continuous schedule of reinforcement, re- 
ceiving appropriate feedback following 
every response. When the three team mem- 
bers each reached at least a .63 level of 
proficiency, they were combined into à 
team. At this time, the probability of all 
three of them performing correctly on any 
one trial would be about .25. This means 
that for any one member, only .25/.63 = .40 
of his correct responses would be reinforced, 
representing a considerable drop from the 
100% level of reinforcement for every cor- 
rect response provided during prior individ- 
ual training. This change in conditions of 
reinforcement can be hypothesized to be 
the critical factor in S’s performance decre- 
ment during the early periods of team 
training and the resulting poor level of team 
performance at that time. With subsequent 
practice, the team members did learn to 
perform more accurately so that the median 
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TABLE 4 
PERCENTAGE PROFICIENCY VALUES* FOR INDIVIDUAL SUBJECTS UNDER CONTINUOUS AND 
PROGRESSIVE APERIODIC SCHEDULES OF REINFORCEMENT 


Subject Pae fpeledieisched iles 
reinforcementb 15 25 35 40 45 50 55 60 
Team 1 
Mi 75 04 12 22 22 31 
Ms 68 ll 14 21 34 35 
M: 43 06 16 23 24 21 
Mean 62 07 14 22 26 29 
Team 2 
Mi 73 21 34 49 53 43 
M: 59 13 20 31 31 86 
M: 65 58 29 35 39 50 
Mean 66 31 28 38 4l 43 


^ The number of reinforced responses divided by total responses. 
b Based on the last 12 5-minute periods of individual training. 


of the values in the Team acquisition: 
Final column in Table 3 more closely ap- 
proximates either median value of Pre- 
dicted team proficiency. 

Á If the hypothesis is correct that low ini- 
tial team performance is a function of the 
change in the conditions of reinforcement 
for team members, a similar phenomenon 
should be observed in a single subject in a 
nonteam setting under simulated team 
training conditions. To examine this hy- 
pothesis, a supplementary study was under- 
taken. In this substudy, two teams of three 
monitors each received the first two train- 
ing phases, individual and pattern training, 
as did the members of the six previous 
teams. However, these teams were treated 
differently in the team training condition. 
Instead of working as a team, Ss were in- 
structed to continue performing as before, 
making only one press to each light pat- 
tern. The panel counters still remained 
operable; the wall counter was not used. 
The schedule of reinforcement presented to 
Ss as they responded to the light patterns 
was manipulated over a 5-day acquisition 
period. 

For one team, Ss received aperiodic rein- 
forcement for correct responses which aver- 
aged .15 on Day One, 25 on Day Two, 
35 on Day Three, .45 on Day Four, and 
55 on Day Five. These values are the pro- 
portion of correct responses that were rein- 
forced; for example, if S made 100 correct 


responses on the first day, he would register 
15 points distributed randomly over the 
100 correct trials. For the second team of 
three monitors, Ss had an aperiodic rein- 
forcement schedule of .40, .45, .50, .55, and 
.60 on consecutive days. For both of these 
teams, this progressively increasing propor- 
tion of reinforcement for each S was analo- 
gous to the increase that was assumed to 
occur in the team setting as group perform- 
ance became more accurate due to the grad- 
ual recovery of individual team members 
from the temporary decrement produced by 
the change in schedule. The 10-minute 
warm-up practice given in the main study 
was provided at the beginning of each day's 
training. 

The results of this substudy are pre- 
sented in Table 4. The data show that a 
change from continuous to aperiodic rein- 
forcement does result in a substantial ini- 
tial decrement in the proficiency of the Ss. 
The greater the change between the con- 
tinuous and aperiodic conditions, the more 
an individual’s performance is likely to de- 
cline. A greater initial performance decre- 
ment was observed in the three monitors 
who received the .15 aperiodic schedule 
following continuous reinforcement than 
those who received the .40 schedule. With 
the progressive increase in the proportion 
of reinforcement, there was an increase in 
the mean of Ss’ proficiencies except on the 
second day of the higher schedule. 
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Conclusions 


Study I analyzed team performance at 
two levels, molar and molecular. In the 
ease of the molar analysis, where only team 
performance was under consideration, it 
was observed that learning principles ap- 
propriate to single organisms were applica- 
ble to a team as a learning entity. At the 
molecular level, some factors were consid- 
ered which might explain the observations 
made at the gross, molar level. In particu- 
lar, the course of team response acquisition 
was explained tentatively on the basis of an 
initial response decrement which was a 
function of the change in the schedule of 
reinforcement following the shift from in- 
dividual to team training. The results of 
Study I suggest the following: 

l. It is feasible to view the team as a 
single performing entity having response 
features which are directly affected by 
team response consequences. Multiman sys- 
tems appear to demonstrate response 
acquisition and extinction phenomena char- 
acteristically observed in individual organ- 
isms. These molar relationships provide a 
basis for investigating molecular relation- 
ships between individual proficiency, team 
communication arrangements and the con- 
sequent courses of team learning and team 
performance. 

2. Although all team members were pre- 
sented with individual practice at the be- 
ginning of each daily session, the teams 
failed to maintain their performance levels 
when group feedback was withheld during 
extinction. Apparently, Ss who perform in 
both individual and team conditions can 
discriminate between the two, and little or 
no individual proficiency transfers to the 
team setting. Although subject to further 
analysis, this hypothesis implies that after 
being trained to a criterion level by indi- 
vidual training methods, subsequent indi- 
vidual practice by team members may be 
relatively ineffective in influencing their 
performance as team members. What prob- 
ably is more important is practice in which 
feedback is furnished on the basis of the 
team product. 

3. The initial drop in performance in the 
team setting suggests that “ease of adjust- 


ment” to a group is a function of the change 
in the schedule of reinforcement to which 
a member is introduced; this change might 
be especially marked if the other members 
of the team are performing at relatively 
low proficiency levels. In order to minimize 
the effects of this change, it may be de- 
sirable to present individual feedback to 
team members about their own performance 
regardless of the team output. It can be 
hypothesized that under this condition Ss 
would not demonstrate the observed decre- 
ment when initially placed in the team set- 
ting. 


Srupy II: Errects or TEAM REINFORCEMENT 
IN PARALLEL TEAMS 


In contrast to the “series” teams investi- 
gated in the first study, Study II considered 
teams linked in "parallel In a parallel 
team, a correct response by any member 
can produce a correct team response. For a 
team with a parallel structure, the rein- 
forcing contingencies are more complex 
than they are for a series team since a 
team reinforcement can occur even when 
one or more members have made an incor- 
rect response. This potential for reinforce- 
ment following incorrect responding is & 
major property of teams where a group of 
individuals are all reinforced by a single 
event, the occurrence of which depends 
upon the integrated responding of some, 
but not all, of the members on any one 
trial. 

The defining property of a parallel team 
is the redundancy which exists among the 
members that compose the team. Team C 
in Figure 4 depicts a two-man series team. 
In Team D in Figure 4 an additional moni- 
tor has been added so that the two monitors 
are in parallel with each other and both 
are in series with the operator. Whenever 
one monitor performs correctly, the per- 
formance of the other monitor is redundant. 
(This arrangement typically is employed 
in performance situations where an incor- 
rect response might have serious conse- 
quences. Common sense suggests the over- 
all proportion of incorrect responses will be 
reduced if more than one team member is 
assigned to the same task.) If a team 18 
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arranged in this way, a redundant member 
can make an incorrect response and yet be 
reinforced along with all of the other mem- 
bers of the team following a correct team 
response. Over several reinforced trials, 
these incorrect responses are likely to be 
strengthened, and, on subsequent occasions, 
such incorrect behavior would have an in- 
creased probability of recurring. Under cer- 
tain conditions, it is probable that both 
monitors will exhibit an increase in the fre- 
quency of incorrect responses over a series 
of trials as a function of reinforcement fol- 
lowing such responses. Thus, although the 
addition of an extra team member in paral- 
lel with an existing member initially may 
result in a higher level of team perform- 
ance, with continued trials the redundant 
members are likely to show a decrement in 
performance. This decrement would result 
in a corresponding decrement in team out- 
put and would bring the level of team per- 
formance to a point equal to or possibly 
below that of the previous, nonredundant 
team. 

A decrement in the performance of par- 
allel teams can by hypothesized to be a 
function of (a) the number of trials on 
which the redundant members were rein- 
forced for incorrect performance, and (b) 
the initial proficiency levels of the members 
and the relative magnitude of these profi- 
ciency levels among the members. Three 
specific hypotheses can be proposed: 

1. With the addition of a redundant 
member, team output will show an initial 
increase in the number of correct responses. 
The size of the increment will depend upon 
the current proficiencies of the team mem- 
bers: if proficiencies are high when a re- 
dundant member is added, increments in 
team performance will be small. For ex- 
ample, if both the operator (Op) and the 
monitor (Mı) of a two-man series team 
have high proficiencies, .86 and .90, respec- 
tively, the probability that the team will 
produce a correct response is (.86) (.90), or 
.77. If a redundant monitor (M2) with a 
proficiency level of .88 is added, the prob- 
ability of a correct team response is Op[M; 
+ M» (1.00 — M;)], or .86[.90 + .88 (1.00 
— .90)], which is equal to .85, an increase 


TEAM C 


Fia. 4. Reinforcement analysis of team when 
redundant member is added. 


of .08 in overall team proficiency. This is 
the probability that the operator along with 
either or both of the monitors will perform 
correctly on any one trial. On the other 
hand, if the original two-man team profi- 
ciencies are lower, e.g., the operator (Op) 
at .80 and the first monitor (Mı) at .50, 
the probability of a correct, two-man team 
response is .40. The addition of a redundant 
monitor (Ms) with a proficiency level of 
.88 will result in a parallel team in which 
the initial probability of a correct team 
response is .75, an increase of .35 in overall 
team proficiency. In general, it is hypothe- 
sized that when the original two-man team 
performs at low proficiency levels, the ad- 
dition of a redundant monitor can add sub- 
stantially to team output. 

2. With subsequent trials, the redun- 
dant team will show a performance decre- 
ment as a result of reinforcement for in- 
correct team-member responses. The rate 
of decrement will be a function of the ex- 
tent to which a redundant member's in- 
correct responses are reinforced; this, in 
turn, will be a function of the proficiency 
levels of each of the members. In order to 
explain this further, consider that in a 
series team of two or more members there 
are three possible response reinforcement 
contingencies. First, for each of the mem- 
bers of the series team a correct response 
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can be followed by a team reinforcement; 
this “appropriate” reinforcement occurs 
when all members perform correctly. Sec- 
ond, a correct individual response also can 
be followed by no reinforcement; these ex- 
tinction trials occur for a team member 
when he performs correctly but other mem- 
bers perform incorrectly. Third, an incor- 
rect response by a team member in a series 
team always must be followed by no rein- 
forcement since the team cannot possibly 
be correct when any member has performed 
incorrectly. 

In a parallel-redundant team, where only 
one of the monitors along with the operator 
needs to respond correctly for correct team 
performance, a fourth reinforcement con- 
tingency occurs in addition to the three 
possibilities which exist for a series team. 
When a monitor responds incorrectly on the 
same trial that the other monitor responds 
correctly, his incorrect response is followed 
by a team reinforcement. On any one trial, 
the probability that a particular kind of 
feedback contingency will occur for a spe- 
cific team member can be determined on 
the basis of the proficiency levels of all 
members of the team. For example, if Mı = 
.70, Mz = .40, and Op = .80, team profi- 
ciency (the probability of a correct team 
response) for any one trial is .66. On this 
trial, Mı will be appropriately reinforced 
when both he and Op respond correctly, 
which will occur with a probability of (.70) 
(.80), or .56. For Mə the probability of 
team reinforcement following a correct re- 
sponse is .32. The probability of M; re- 
ceiving a team reinforcement following an 
incorrect response is the probability that 
M; is incorrect (1.00 — .70) on the same 
trial that M2 and Op are both correct (.40) 
(.80) ; or, (.30) (.40) (.80) = .10. For M5, 
the probability of reinforcement following 
an incorrect response is .34. (A complete 
analytic description of team learning would 
need to consider the changing response 
probabilities of the members from trial to 
trial instead of just the momentary, one- 
trial state of affairs considered here.) 

The above, hypothetical data show that 
on trials when M, receives a reinforcement, 
the ratio of correct to incorrect responses 


is .56 to .10, or approximately 6 to 1. For 
Mz the ratio is .32 to .34, or approximately 
1 to 1. It is evident that M; and Mz, espe- 
cially M5, have been placed in an environ- 
ment in which it may be difficult to main- 
tain the proficiency necessary for successful 
team performance. As a result of partial 
reinforcement, for incorrect responses, it is 
possible to hypothesize that both monitors 
will show an increase in the frequency of 
incorrect responses and, consequently, that 
the team will show a performance decre- 
ment. 

3. If one redundant member’s proficiency 
is very high, no decrement in overall team 
performance is expected. One redundant 
member can carry the load for other, par- 
allel members. For example, if M, starts 
with an initially high proficiency and M; 
with a relatively low one, Mz should show 
a faster decline in proficiency than M; . In 
such a case, it is likely that the ratio of 
correct to incorrect responses for which M; 
is reinforced will change and approach a 
more favorable one for maintaining the 
correct response. Thus, it can be hypothe- 
sized that for a team consisting of redun- 
dant members who have initially divergent 
proficiency levels, one very high and the 
other low, team performance would become 
primarily a function of the more proficient 
member and the contribution of the poorer 
member would become increasingly small. 
On the other hand, when the divergence in 
initial proficiency is not great, the perform- 
ance of both monitors can be expected to 
deteriorate concurrently (as described in 
Hypothesis 2 above). 


Procedure 


Preliminary training. Six parallel teams of 
three Ss each worked for a median of 37 days 
(range 22 to 54 days). The team tasks assigned 
to the monitors and operator were the same as 
in the first study. The experiment consisted of the 
four phases of training shown in Table 5. Indi- 
vidual training and stimulus pattern training were 
identical to Study I. For the individual training 
phase, the median number of 5-minute periods 
required to reach criterion was 44 (3.7 hours); the 
range was 35 periods to 88 periods (29 to 7.3 
hours). 

Two-man team training. Each day began with 
a 10-minute individual warm-up period, as in the 


A REINFORCEMENT ÁNALYSIS or Group PERFORMANCE 15 


first study. For the remainder of the one and one- 
half hour session, each monitor worked alternately 
for 15 minutes with the operator in a two-man 
series arrangement. During the time that one 
monitor worked with the operator, the other 
practiced alone with individual feedback. Both 
monitors received the same number of practice 
periods with the operator. Rest periods were in- 
serted, one after the first 25 minutes and one 
after 55 minutes. The criterion for this phase of 
the study was four consecutive 5-minute periods 
during which 15 or more team points were scored 
by each two-man team. The median number of 
5-minute trials to reach this criterion was 37 
(3.1 hours), and the range was from 9 to 45 5-min- 
ute periods (0.7 hours to 3.7 hours). 

Parallel team. training. 'The Ss were instructed 
that they would work as a three-man team to 
score points on the wall counter. The monitors 
were instructed to respond exactly as they had 
been doing, that is, with either a 2- or 4-second 
press for each light pattern. The operator was 
instructed to observe his panel lights. He was 
to respond with a 4-second press if he felt either 
or both of the monitors was correct and with a 
2-second press if neither of them was correct. The 
apparatus was adapted so that only one of the 
monitors and the operator needed to perform 
correctly in order for the wall counter and the 
bell to operate. A correct 4-second press by the 
operator scored a group point if either or both of 
the monitors was correct while his 2-second press 
advanced the program, presenting a new pattern 
to the monitors. The operator was permitted 
additional responses until either a team point 
was scored or until the stimulus pattern was ad- 
vanced. If the team scored a point, the stimulus 
pattern was advanced automatically. The first 10 


TABLE 5 
DESCRIPTION AND CRITERIA OF THE 
TnAiNING PHasEs IN Stupy II 


Training phase Description Criterion 


Individual Same as Study I Same as Study 


I 
Same as Study 


Pattern Same as Study I 
I 
"T wo-man MonitorOneand Four consecu- 
team Operator, and tive 5-min- 
Monitor Two ute periods 
and Operator each with 15 
of each team or more 
practiced as a team points. 
two-man series 
team. 
Redundancy The three sub- Four consecu- 
team jects worked ^ tive 5-min- 
as a three-man ute periods 
team with par- each with 10 
allel monitors. or fewer 
team points. 


minutes of each session were spent in individual 
practice of the 2- and 4-second presses as before. 

Since it was hypothesized that teams in a re- 
dundant arrangement would show a decrement 
with continued performance, the criterion for 
this phase was four consecutive 5-minute periods 
of reinforced team performance with 10 or fewer 
group points in each. The Ss were male high 
school students at least 16 years old who were 
paid one dollar an hour. The display-response 
panels and control apparatus were the same as 
that employed in the first study. 


Results: Molar Analysis 


Figure 5 shows the performance- curves 
for the six teams plotted in terms of per- 
centage proficiency per day (the number of 
correct team responses over the number 
of total team responses) for both the two- 
man teams and the subsequent three-man 
parallel teams. The median number of 5- 
minute periods for five of the six teams to 
reach the three-man team criterion was 186 
(15.5 hours), with a range from 111 to 437 
periods (about 9.2 to 36.3 hours). One team 
failed to reach criterion in 444 5-minute 
periods (37 hours), after which the experi- 
ment was terminated for that team. 

The proficiency of the two-man teams 
during the 12 periods preceding the eri- 
terion trials at the end of two-man team 
training and the proficiency of the three- 
man teams during both the first 12 periods 
at the beginning and the 12 last periods 
preceding the criterion trials at the end of 
the parallel team training are given in 
Table 6. As can be seen from the table, five 
of the teams demonstrated some increment 
in performance over that of either two-man 
component following the initial addition 
of a redundant team member. As a result 
of continued training, the proficiency of 
four of the six teams fell to below the level 
of performance observed when the three- 
man teams were first formed. The criterion 
for ending two-man team training required 
highly proficient performance, four con- 
secutive 5-minute periods each with 15 or 
more team points, and the criterion for end- 
ing redundancy training required a sub- 
stantial decrease in proficiency level, to 
four consecutive 5-minute periods each 
with 10 or fewer team points (see Table 5). 
Because the termination of these phases of 
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Fic. 5. Percent proficiency for teams in Study II, 


training was contingent upon criteria of 
high and low performance which may not 
have been representative of each team’s 
overall performance trend, the data in Ta- 


ble 6 describe proficiency during the 12 pe- 
riods (1 hour) preceding the criterion trials 
and omit these trials entirely. If the cri- 
terion trials had been used as the basis of 
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comparison, five of the six teams would 
have shown an increment at the beginning 
of three-man training and, of course, all 
teams would have evidenced a decrement 
at the time this phase was terminated. 

It might be argued that evidence of an 
increment in five of the six teams when the 
redundant member was added and evidence 
of a decrement in only four of the six teams 
when redundancy training was terminated 
does not support the hypotheses concerning 
the addition of a redundant member to a 
team. However, the increment was evident 
even when the criterion trials were not con- 
sidered, and the decrement occurred de- 
spite additional team practice which, with- 
out a redundant member, should have 
maintained the proficiency of all teams at 
high levels as a function of continued rein- 
forcement. The question of why all six 
teams did not decline is discussed below. 

Another possible explanation of the dec- 
rement is that after 22 to 54 days on the 
task, the motivation of the subjects was 
influenced and that boredom occurred lead- 
ing to performance deterioration. A reason- 
able refutation of this boredom hypothesis 
is provided by the data in Table 7. In this 
table, proficiency data for each monitor is 
given separately for each quarter of the 
trials during the sessions of three-man team 
practice, The column IP describes the profi- 
ciency level of each monitor during the pe- 
riod of individual practice which preceded 
each day’s session. As can be seen from this 
column, individual proficiency was main- 
tained under individual practice conditions 


TABLE 6 
Srupy II. AVERAGE PROFICIENCY DURING 
NONCRITERION TRIALS AT THE END OF 
Two-Man TEAM TRAINING AND 
UNDER REDUNDANT 


CONDITIONS 
Nonredundant Redundant team 
Team Mı + Op Mz+ Op Initial Final 
12 Trials 12 Trials 12 Trials 12 Trials 

1 .55 -69 .82 .89 

2 .89 .64 994 T5 

3 .91 .85 .94 -78 

4 71 .59 63 .81 

5 .69 .69 .92 Br 

6 .85 .88 .95 .91 
Mdn. Br .93 .80 


throughout the experiment. If boredom did 
occur, it did not manifest itself during in- 
dividual praetice when continuous rein- 
forcement for correct responses was being 
supplied. Since there is no evidence that 
boredom affected individual performance, 
it is difficult to attribute the team decre- 
s to boredom or other motivational ef- 
ects. 


Results: Molecular Analysis 


Detailed results on the performance of 
the monitors of each of the parallel teams 
are shown in Table 7. For each monitor, 
three sets of data are reported for successive 
quarters of the total number of team trials. 
The first column indicates the proportion 
of correct responses made by the monitor 
during the 10-minute warm-up period of in- 
dividual practice (IP) preceding each day’s 
team training. The second column indicates 
the proportion of correct responses made by 
the monitor during team performance (TP). 
The third column indicates the ratio of in- 
appropriately reinforcing feedback (IF) to 
appropriately reinforcing feedback (AF) 
received during team training. This entry 
was computed by dividing the number of 
trials on which the monitor received rein- 
forcement following an incorrect response 
by the number of trials on which he received 
a reinforcement following a correct re- 
sponse. The proficiency of the overall team 
is shown in the last column. 

As described in the molar analysis, there 
was a tendency for overall team proficiency 
to decline with continued practice by the 
redundant teams. This is particularly evi- 
dent in the results for Teams 2, 3, and 5. 
Team 6, as already reported, evidenced 
only a slight decline even after 39 days of 
training. Teams 1 and 4 reached criterion 
quickly after having maintained or even in- 
creased their proficiency. These trends in 
team performance tend to be reflections of 
the individual monitor data shown in Table 
7. Individual proficiency declined steadily 
for both monitors in Teams 2, 3, and 5 
during team practice (TP). At least one 
monitor evidenced little change or even an 
increase in proficiency in each of the re- 
maining teams. Whenever à monitor and 
his partner both declined in proficiency, 
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TABLE 7 
CHANGES IN INDIVIDUAL AND TEAM PROFICIENCY IN REDUNDANT TEAMS 
Mi M: 
Team Days Q T TP IF/AF IP Ha TFJAF Overall team 
1 11 1 66 .56 ES .B4 57 .42 -71 
2 65 -60 .39 .74 .99 .42 .68 
3 74 .59 .28 .81 .40 87 -70 
4 81 .56 .94 .92 .45 -70 -68 
2 37 1 91 .87 .10 67 .63 51 .89 
2 86 44 .20 .68 .97 .55 .84 
3 80 .67 .26 .68 .52 .60 .76 
4 87 48 .58 .T9. .53 42 -69 
3 10 1 71 71 B85 74 .85 .12 94 
2 61 65 E 74 RC .20 .90 
3 61 .67 .32 .82 .63 .39 .85 
4 71 48 -58 -79 .53 .42 71 
4 30 1 .90 73 +24 76 67 36 74 
2 93 -78 .21 -70 -75 .26 “71 
3 95 .89 .09 .69 «70 .39 .91 
4 93 84 .12 .07 .064 48 .85 
5 16 1 82 .83 .15 74 -72 -31 .93 
2 82 .82 .15 16 .65 .44 .90 
3 83 .79 A e .85 .61 -50 .90 
4 93 .60 .38 .80 57 .46 .78 
6 39 1 83 .83 .18 .82 .85 .14 92 
2 94 75 +25 71 -75 +25 .90 
3 88 .80 15 .68 74 .28 .90 
4 86 .81 17 73 .76 .26 .93 


there was a corresponding change in the ra- 
tio of inappropriate to appropriate rein- 
forcements (IF/AF). This was the case for 
Teams 2, 3, and 5. As has been indicated, 
no decline corresponding to the decrease in 
team proficiency is evident in individual 
performance not occurring in the team set- 
ting (IP). 

The results of Study II have shown that 
under certain conditions team reinforce- 
ment can eventuate in no further increase 
in correct responding or in a performance 
decrement both for the team and the indi- 
vidual team members. This outcome is a 
function of the team arrangement and the 
probability of correct responses by team 
members. Specifically, in the three-man re- 
dundancy arrangement, reinforcement is 
contingent upon a correct team response 
and not necessarily upon correct member re- 
sponses. This, in turn, permits the strength- 
ening of incorrect responses. In the two- 
man series arrangement used prior to the 
parallel team training phase, both members 
were reinforced whenever the team per- 
formed correctly. Under those circum- 


Stances, team responses were on a contin- 
uous reinforcement schedule but correct 
individual responses were on an aperiodic 
reinforcement schedule that depended upon 
the response probabilities (proficiency lev- 
els) of the individual members. With the 
addition of a redundant member, correct 
team responses remained on a continuous 
schedule which, for most teams, initially 
represented a higher reinforcement ratio 
than existed in the two-man teams. How- 
ever, individual reinforcement for the mon- 
itors became “confounded” because incor- 
rect responses were aperiodically reinforced. 
As predicted, whenever the competition of- 
fered by the increase in strength of incorrect 
responses occurred, it tended to result in an 
eventual performance decrement both for 
the individual and the team. 

This analysis of the effects of adding a 
redundant member to a team suggests the 
reasons why Team 6 did not reach the dec- 
rement criterion and why Teams 1 and 4 
did not demonstrate a stable decline in pro- 
ficiency. Teams 2, 3, and 5, which most 
clearly showed the predicted decrement, 
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were characterized by wider differences in 
initial monitor proficiencies in the team 
condition (TP) than Teams 1, 4, and 6. 
This wide range would facilitate reinforce- 
ment for incorrect responses and increase 
the likelihood of a decrement in team per- 
formance as described in the second hy- 
pothesis introducing Study II. 

The finding that individual proficiency 
shows differential changes over time de- 
pending upon whether performance takes 
place in an individual or team setting is of 
interest. With continued practice in the re- 
dundancy setting each monitor's proficiency 
tends to decline while his performance in 
the individual, 10-minute warm-up ses- 
sions tends to remain steady or even in- 
crease. It appears that certain differential 
cues in the two situations, including the 
conditions of reinforcement, can be dis- 
criminated. In the warm-up period, where 
reinforcement only follows correct respond- 
ing, proficiency is maintained at a generally 
high level. However, the team setting where 
reinforcement also can follow incorrect re- 
sponding results in more variable and, 
therefore, less proficient behavior. It is evi- 
dent that a skill such as “timing” can be 
performed in different situations with dif- 
ferent levels of proficiency. 

On the basis of the data in Study II, the 
following conclusions can be presented: 

1. The addition of a redundant member, 
that is, one whose performance requirement 
is identical to that of an already existing 
team member, tends to result in an initial 
increment in the team’s performance. This 
can be predicted in terms of an increase in 
the probability of correct performance when 
a parallel response component is added to 
the team. The amount of this initial incre- 
ment depends on the proficiency of the 
original team. If the proficiency of the 
original team is high, the addition of a re- 
dundant member will have little effect, es- 
pecially if the redundant member's per- 
formance level is low. On the other hand, if 
the proficiency of the original team is low, 
the addition of a redundant member will 
have a decided effect, often even if the re- 
dundant member himself has a relatively 
low proficiency. In general, the relation- 


ships between the performance levels of the 
original team and the subsequent. perform- 
ance levels of a redundant team can be ex- 
plained in terms of the structure of the 
team and the individual proficiencies of the 
members. 

2. As training is continued in the redun- 
dant situation, performance level is likely 
to return to or fall below that of the origi- 
nal, nonredundant team. This characteristic 
of redundant teams is explained by the re- 
inforeing condition which permits aperiodic 
reinforcement for incorrect member re- 
sponses, The decrement may develop quite 
slowly, occurring only after many repeti- 
tions of the team task or it may not mani- 
fest itself at all depending on how long 
team performance is observed and on the 
degree of divergence in the performances 
of the team members. There were no data 
in the present study to support the hypothe- 
sis that a team might not show a decrement 
if one member of a divergent team were 
to perform at a very high level of profi- 
ciency. In future studies, the selection of 
redundant members could be systemati- 
cally varied in order to control for the 
amount of difference in their respective pro- 
ficiencies. 

A “commonsense” analysis suggests that 
one method for increasing team proficiency 
is to add team members in parallel who 
would duplicate existing performance re- 
quirements. The data from Study II have 
shown that team proficiency may increase 
initially with the addition of redundant 
members. However, when this is done, a 
schedule of reinforcement is introduced in 
which a member’s incorrect responses may 
be reinforced and an eventual decrement in 
team proficiency may result. This can occur 
despite a schedule of continuous reinforce- 
ment for correct team responses. 


Discussion 


Variations in team performance can be 
described as a function of conditions both 
external to the group and within it. Exter- 
nal conditions refer to events which impinge 
upon the group from its outside environ- 
ment; internal conditions refer to the way 
the group is organized and the manner in 
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which it functions, for example, its commu- 
nications, structure, and processing proce- 
dures. The research described in this paper 
provides preliminary evidence for lawful 
relationships at these two levels: one exper- 
iment involved a manipulation of external 
and one of internal conditions. The first 
study investigated changes in group per- 
formance which occur as a function of the 
external consequences of group behavior. 
The second study investigated changes in 
group performance which occur as the re- 
sult of an internal change in the manner in 
which the group is structured. 

If the performance of a group is consid- 
ered without reference to the performance 
of its members, it is apparent that one 
group may be better than another at an as- 
signed task or a given group may improve 
or become worse. The conditions which lead 
to such changes in group output comprise 
one level of analysis. A second level of anal- 
ysis can be thought of as concerned with 
the effect of the group as an environment 
in which individual performance occurs. 
Both the molar and the molecular levels of 
analysis can contribute to the description 
of group performance, 

Considering only the group and not the 
responses of the individual members, two 
generalizations about group performance 
were identified in this research: 

1. The performance of a group is sensi- 
tive to, that is, a function of, the conse- 
quences of its performance. These conse- 
quences were defined as feedback which 
provides information as to the success or 
failure of the group’s previous actions. This 
result suggests that group or team practice 
without appropriate (i.e., differential) feed- 
back will be an insufficient condition to 
achieve or maintain group proficiency; 
practice alone may well lead to a decrement 
in group proficiency as a result of the ab- 
sence of reinforcement.’ Even for very high 
levels of initial team proficiency, some form 
of differential feedback must be utilized to 
prevent a deterioration in the quality of 
group composite responses. 

2. While group performance can be re- 


‘Thus, for an athletic team, it is not how well 
the game is played that counts but whether the 
game is won. 


lated to its consequences, these conse- 
quences, in turn, are insensitive to, that i is, 
not necessarily contingent upon, the success 
or failure of individual member responses, 
For example, in the first study reported, 
correct performance by only one partici- 
pant could not lead to group success even 
under acquisition conditions. In the second 
study, incorrect performance by one of the 
participants often was followed by group 
success. Furthermore, the group may evi- 
dence certain phenomena as the result of 
the consequences of its performance which 
are not necessarily consistent with what 
might be assumed from the study of indi- 
vidual behavior. As was illustrated by the 
performance decrement of teams in the sec- 
ond study, team performance may deterio- 
rate even when the team is supplied with 
continuous reinforcement for correct re- 
sponses, 

It is possible to derive these same gener- 
alizations about group performance from 
an analysis of the changes in individual 
member performance which occur as a fune- 
tion of the reinforcement contingencies ex- 
perienced by each member. Assuming ex- 
ternal conditions remain constant, these 
contingencies and their influence upon team 
proficiency can be predicted from a knowl- 
edge of the structure of the team and the 
probability of correct performance by in- 
dividual members. In this sense, the group 
constitutes a specifiable environment in 
which individual performance can be stud- 
ied. 

When an individual performs in a non- 
group situation, increments or decrements 
in his proficiency occur as a result of the 
reinforcement he receives. The circum- 
stances under which reinforcement is pro- 
vided are a function of external influences 
established by the task situation (or by the 
experimenter in the laboratory) and by the 
momentary response probability of the in- 
dividual. In the team setting, the environ- 
ment external to each member takes on 
characteristics which reflect the group’s 
structure and the proficiency of other team 
members. Specifically, in a series team, the 
schedule of reinforcement for any one mem- 
ber is defined by the probability that all 
other members are correct whenever he is 


A REINFORCEMENT ÁNALYSIS or GROUP PERFORMANCE 21 


correct; whether or not he is reinforced fol- 
lowing his correct response depends upon 
the proficiency of the other members of his 
team. In a parallel team, the predominant 
condition is the likelihood of reinforcement 
for an incorrect member response which, 
again, is determined by the performance of 
the other team members. 

For example, consider a two-man team in 
which one member has a proficiency of .70 
and his partner has proficiency of .40. In a 
series arrangement, the first member will 
be reinforced only for 40% of his correct 
responses; his ratio of reinforcement is de- 
termined directly by his partner’s profi- 
ciency. In a parallel arrangement, his cor- 
rect responses always would be reinforced. 
However, he also would receive a reinforce- 
ment following 40% of his incorrect re- 
sponses, again a direct function of his part- 
ner’s proficiency. As the size of the team 
increases, or as the level of proficiency of 
other members decreases, the probability of 
a reinforcement following the correct re- 
sponse of an individual member of a series 
team is reduced. In a parallel team, the de- 
velopment of a response by a member which 
competes with his correct response similarly 
grows more probable as the size of the team 
and the proficiency of the other members 
increases. 

To the extent that a team member’s en- 
vironment can be specified in terms of team 
arrangement (i.e., series or parallel), team 
size, and member response probability, it 
is possible to simulate this environment for 
an individual. This makes it possible to 
conduct a detailed analysis of the effects of 
various team-produced learning environ- 
ments on the performance of individual 
team members. In turn, the resulting per- 
formance of the individual team member 
will have predictable effects on team pro- 
ficiency and on the occurrence of team re- 
inforeement, Simulation of the conditions 
of a group environment can be used to 
specify the interaction between the influ- 
ence of a team environment on an indi- 
vidual member and the influence of the 
performance of that member on the char- 
acteristics of his environment. For instance, 
it is possible to investigate the effect of 
schedules of reinforcement which are char- 


acteristic of particular team arrangements 
on individual performance and to vary in- 
dividual performance to determine its ef- 
fects upon the group environment and upon 
team proficiency. Some form of simulation, 
furthermore, can provide opportunities for 
exerting experimental control over the dy- 
namics (the trial-to-trial changes) of team 
and team member interactions so as to 
make the reinforcement contingencies of a 
team environment for any one trial respon- 
sive to the performance of the team mem- 
ber on the immediately preceding trials. 
The effects of various combinations of team 
member proficiencies, as these proficiencies 
change with prior team success, can be con- 
trolled experimentally and assessed in terms 
of the resulting increase or decrease in the 
probability of a correct individual and team 
response. Using such an approach, it is con- 
ceivable that group phenomena, at least 
those described in these studies, can be in- 
vestigated employing “one-member” teams. 

In the larger context of social psychol- 
ogy, studies of the kind reported here can 
provide insights into the differences be- 
tween a social environment and that tradi- 
tionally studied in investigations of indi- 
vidual learning. Performance in a social 
environment ultimately might be described 
as the consequences of particular reinforce- 
ment schedules which are found primarily 
in multiman systems and which differ from 
those which are typically investigated when 
studying individual performance. The meth- 
odology of team study reported in this 
paper might be extrapolated to the complex 
events characteristic of the tasks and struc- 
tures of more elaborate social organiza- 
tions. If the task of social psychology can 
be defined as the simultaneous investiga- 
tion of the way in which group membership 
affects individuals and the way in which 
individuals influence the groups to which 
they belong, then this methodology may 
lead to a better understanding of key as- 
pects of these kinds of behavioral changes. 


SUMMARY 
The major purpose of the studies de- 


scribed in this monograph was to assess the 
feasibility of considering a multiman team 
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as a learning unit which reacts to reinforce- 
ment contingencies in much the same way 
as do individual organisms. Accordingly, a 
team response should exhibit acquisition 
and extinction as a function of the proper- 
ties of the reinforcement situation following 
each learning trial. Study I was designed to 
test this general hypothesis by determining 
the influence of the presence and absence 
of team reinforcement on the performance 
of a three-man team in which team rein- 
forcement was contingent upon correct re- 
sponses from all team members. Study II 
considered a more complex team structure 
in which team reinforcement was contingent 
on correct responses by only some of the 
team members. This study examined the 
interactive effects of group feedback upon 
individual member performance and its 
subsequent influence on overall team pro- 
ficiency. 

In both studies the units of investigation 
were three-man teams in which each mem- 
ber was assigned a specific task. The task 
situation was constructed so that no mem- 
ber received any feedback about the accu- 
racy of his own or any other member’s per- 
formance until the entire team completed 
the task. The response required from each 
team member was the estimation of a short 
time interval. 

In Study I, six teams of three Ss each 
were trained 144 hours a day for a median 
of 31.5 days. The first experiment followed 
an operant conditioning paradigm in which 
team response learning was investigated in 
terms of acquisition, extinction, spontane- 
ous recovery, reacquisition, and reextinc- 
tion. The data obtained from Study I in- 
dicated that changes in team performance 
conformed to predictions made on the basis 
of knowledge of individual organism behav- 
ior when the occurrence of reinforcement 
is controlled following each response. Using 
a simple probability model, a predicted 
level of performance for each team was cal- 
culated from individual team member pro- 
ficiencies. Early in team learning, differ- 
ences between observed and predicted team 
proficiency were obtained. This decrement 
was explained in terms of the change in 
schedules of reinforcement from the indi- 


vidual to the team setting. With subsequent 
team practice, observed team performance 
levels more closely approximated predicted 
levels. 

Study II investigated teams with a par- 
allel structure where the reinforcement con- 
tingencies were more complex than for the 
series teams investigated in Study I. The 
defining property of the parallel team is the 
redundancy which exists among team mem- 
bers. Whenever certain team members per- 
form correctly, the performance of other 
team members is redundant. Common sense 
often suggests that team errors will be re- 
duced by adding redundant members. How- 
ever, a redundant member can make an 
incorrect response and yet be reinforced 
along with all of the other members of the 
team following a correct team response. 
Over a sequence of reinforced trials, these 
incorrect responses are likely to be strength- 
ened and, on subsequent occasions, such 
incorrect behavior would have an increased 
probability of recurring. The primary pur- 
pose of Study II was to investigate this 
property of parallel teams. 

The results of Study II showed that the 
reinforcement contingencies set up by the 
structure of a parallel team could result in 
a performance decrement as a function of 
team arrangement and the probability of 
correct responses by team members. The 
data obtained in Study II indicated the fol- 
lowing characteristics of parallel team per- 
formance. The addition of the redundant 
member, that is, one whose performance re- 
quirement is identical with that of an al- 
ready existing member, tends to result in an 
initial increment in team performance. This 
can be predicted in terms of an increase in 
the probability of correct performance when 
a parallel response component is added to 
the team. As training is continued in the 
redundant situation, performance level 
may return to or fall below that of the 
original nonredundant team. This charac- 
teristic of redundant teams is explained by 
the reinforcing condition which permits 
aperiodic reinforcement for incorrect mem- 
ber responses. The decrement may develop 
quite slowly, occurring only after many 
repetitions of the team task. 
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The discussion considers changes in team 
performance as a function of conditions ex- 
ternal to and within the team. External 
conditions refer to events which impinge 
upon the group from its outside environ- 
ment and serve as team reinforcers; inter- 
nal conditions refer to the way the group 


is organized and the manner in which it. 
functions, for example, its communications, 
structure, and processing procedures which 
establish the learning environment for team 
members. The research in this monograph 
is seen to provide preliminary evidence for 
lawful relationships at these two levels. 
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EXPERIMENTAL ANALYSIS OF RESPONSE SLOPE AND 
LATENCY AS CRITERIA FOR CHARACTERIZING 
VOLUNTARY AND NONVOLUNTARY RESPONSES 


IN EYEBLINK CONDITIONING' 


KENNETH P. GOODRICH 
` Macalester College 


Previous work on the identification of instrumental or voluntary (V) 
responses in eyeblink conditioning led to the use of a response latency 
criterion in some experiments and a response slope criterion in others. The 
present study (a) examines the relation between response latency and slope 
and (b) seeks by instructing some Ss to blink and others not to blink to 
develop rational criteria for the identification of V responses. Latency and 
slope were clearly not equivalent bases for this identification. Moreover, 
analyses of the data of Ss who received the special instructions showed that 
the conventional latency and slope criteria both had serious deficiencies. New 
criteria developed from these data were more successful but still of debatable 
value. The implications of „these findings for the significance of eyeblink 


conditioning research were discussed. 


HEN defined in terms of the proce- 

dures which experimenters follow, 
classical and instrumental conditioning ex- 
periments are distinetly different. In the 
procedure which defines instrumental con- 
ditioning, some object or event is made an 
outcome for some response of the subject 
(S). This outcome, which is controlled by 
the experimenter, comes to “control” the be- 
havior upon which it is contingent. In the 
procedure which defines classical condition- 
ing, no such contingent outcomes, or instru- 
mental contingencies, are specified. The 
conditioned stimulus (CS) and the uncon- 
ditioned stimulus (UCS) are presented to 
S regardless of what S happens to be doing. 
Psychologists have not generally been 
content with this procedural distinction, 
however unambiguous it may be. They have 
wanted to know whether different learning 
processes correspond to the different proce- 
dures. Both negative and affirmative an- 


? This work was partially supported by National 
Science Foundation Grant GB-203 to the Univer- 
sity d Pennsylvania and by National Institutes of 
Health Grants MH-04528 and FR-00167 to the Uni- 
versity of Wisconsin. Alice Isen, Nancy Miller, 
and Susan Shuben ran the Ss and did preliminary 
data tabulation, F. Robert Brush, Joseph Marko- 
witz, and A. Martin Wall provided valuable assist- 
ance and criticism at several points during the 
investigation. 


swers have been given to this query, and a 
variety of mechanisms have been proposed 
to make either answer consistent with the 
presence of contingent outcomes in instru- 
mental conditioning experiments and their 
absence in classical conditioning experi- 
ments. 

We must note carefully at this point that 
the absence of contingent outcomes in clas- 
sical conditioning experiments is an absence 
which occurs in the “rules” followed by the 
experimenter. Such an absence of contingent 
outcomes does not necessarily mean that 
effective temporal contiguities between re- 
sponses and controlling outcomes are absent 
in a particular classical conditioning experi- 
ment and not playing a major role in deter- 
mining the behavior which occurs in the 
situation. Hull (1943, pp. 74-79), for ex- 
ample, argued that since an unconditioned 
response (UCR) occurred in time after a CS 
and was accompanied by the UCS, his 
theory of instrumental learning could be 
applied to the classical conditioning experi- 
ment if the UCS were properly construed as 
a drive reducer. That is, an outcome (the 
presence of a positive UCS or removal of a 
negative UCS) follows the response and 
thus helps to attach the response to the CS, 
in spite of the fact that the experimenter 
presumably does not explicitly arrange an 
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instrumental contingency. Hull, then, did 
not believe there was a classical condition- 
ing process. The procedural difference was 
not one that made any fundamental differ- 
ence. Without outcomes, responses would 
not be learned. 

Hull’s viewpoint has not been widely 
adopted, perhaps in part because of the 
experiments by Mowrer and his associates 
(Mowrer & Aiken, 1954; Mowrer & Solo- 
mon, 1954) which have generally been in- 
terpreted to east doubt on Hull's interpreta- 
tion of classical eonditioning. Thus, it is 
still a plausible working hypothesis that a 
classical conditioning process exists which 
does not require outcomes for the learned 
response. From this point of view, the re- 
sults of a classical conditioning experiment, 
as defined above, do not necessarily provide 
a clear picture of the classical conditioning 
process. To provide such a picture, it must 
be possible to show that there are no in- 
advertent, confounded, or “superstitious” 
temporal contiguities between response 
and outcome in any given classical condi- 
tioning experiment. 

One of the most widely employed classi- 
cal conditioning situations is eyeblink con- 
ditioning, as shown by even a hasty survey 
of the literature over the last 25 yr. and by 
the fact that many of the principles of con- 
ditioning discussed by Kimble (1960) gain 
a large measure of their support from such 
experiments. To the extent that one wants 
to regard these experiments as reflecting 
the process of classical conditioning (in the 
sense discussed above), one must argue that 
the temporal contiguities discussed above 
cannot reasonably be said to contribute to 
the essential features of the results. 

It would appear that of the several clas- 
sical conditioning situations most frequently 
encountered, eyeblink conditioning is the 
one for which it is most difficult to dismiss 
the possible role of confounded contiguities 
between response and outcome. To one ap- 
proaching eyeblink conditioning for the first 
time, whether a psychologist or a college 
sophomore serving as an S in such an ex- 
periment, the conclusion often seems in- 
escapable that one is dealing with instru- 
mental avoidance conditioning: by giving 


an anticipatory blink to the CS S is miti- 
gating the unpleasant effects of a puff of 
air in the eye. Certainly there is no doubt 
that the eyeblink can be an instrumental 
act. We will most likely observe an instru- 
mental blink if we walk up to someone and, 
by means of a threat, make avoidance of a 
poke in the nose contingent on a blink. The 
question, then, is not whether the eyeblink 
can be brought under the control of its out- 
comes but rather whether in the classical 
conditioning situation it is under such con- 
trol. 


Voluntary and Conditioned Responses 


Given the possibility that the avoidance 
contingency does play some role in eyeblink 
conditioning, what course of action is in- 
dicated for one who wishes to study the 
classical conditioning process as discussed 
here? One line of work which has come to 
grips with this problem has attempted to 
distinguish voluntary (V) from conditioned 
(C) eyeblinks and to find ways of removing 
from conditioning data the V responses. It 
should be noted at this point that the as- 
sumption is made that “voluntary” in this 
context means much the same as “instru- 
mental” in our earlier discussion. More- 
over, it is assumed that the great majority 
of the voluntary or instrumental responses 
which one would encounter in an eyeblink 
conditioning experiment would arise through 
the ayoidance possibility. It is likely, of 
course, that V responses will arise for other 
reasons as well, such as misunderstanding 
of instructions or motives to “cross-up” the 
experimenter. The controlling outcomes for 
such responses are idiosyncratic and not 
directly a function of the experimental pro- 
cedures. These “unconfounded” outcomes 
are contrasted with the “confounded” out- 
comes produced by avoidance: the possible 
effects of an anticipatory response on the 
aversiveness of the UCS which follows it. 

The work reported in the present paper 
follows directly from previous work by 
Spence and Ross (1959) and Hartman and 
Ross (1961). These two papers, in turn, 
were based on earlier observations by 
Spence and Taylor (1951). Spence and Tay- 
lor proposed a procedure for eliminating 
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responses from the data obtained in eye- 
blink experiments. On the basis of observa- 
tions which were not presented explicitly, 
they concluded that Ss who admitted to 
being aware of the purpose of the experi- 
ment typically gave responses with a char- 
acteristic form and with latencies shorter 
than 300 msec. Thus Spence and Taylor 
proposed that in a conditioning experiment, 
an S whose median latency of anticipatory 
responses was less than 300 msec, be dis- 
carded. 

Before continuing this historical intro- 
duction, we may pause to consider briefly 
the reason for discarding Ss rather than re- 
sponses in order to rid data of voluntary in- 
fluences. The answer appears to lie in the 
reasonable presumption that the occurrence 
of a V response on a given trial will pre- 
clude, or at least greatly attenuate the 
probability of, the occurrence of a C re- 
sponse on that trial. Therefore an estimate 
of the probability of a C response obtained 
with data from which V responses have been 
removed would be spuriously low. Because 
learning curves expressed in terms of esti- 
mated probabilities of responses have gen- 
erally been the main object of concern in 
eyeblink conditioning experiments, Ss, 
rather than responses, have been eliminated. 
The reader should remember, however, that 
the validity of such a procedure has ulti- 
mately been evaluated in terms of how 
successfully it manages to eliminate V re- 
sponses. 

For several years following the work of 
Spence and Taylor, the procedures recom- 
mended by these authors were adopted in 
many eyeblink conditioning experiments. In 
1959, Spence and Ross published an article 
in which data were presented in support of 
the latency criterion for identifying V re- 
sponses, Two independent observers exam- 
ined each response of each S in a condition- 
ing experiment and on the basis of the form 
of the eyelid closure judged the response to 
be voluntary, conditioned, or not scorable. 
The criteria employed by the judges (dis- 
cussed in greater detail below) presumably 
Were based on responses of an unknown 
number of Ss who had been instructed to 
blink; these data were not presented. How- 


ever, when frequency distributions were 
plotted for the latencies of the responses in 
the several judged categories, the latency 
procedure was shown to be a good one. That 
is, the majority of responses judged to be 
voluntary were eliminated by discarding 
Ss whose median latencies were less than 
300 msec. It was argued that the latency 
distinction was justified insofar as it re- 
sulted in the elimination of responses which 
were judged to be voluntary (and the simul- 
taneous retention of most of the responses 
judged to be nonvoluntary). 

Although various investigators continued 
to use the latency discard criterion, no in- 
formation was available concerning whether 
this criterion worked in the face of various 
kinds of experimental variations. In 1961, 
Hartman and Ross reported that for one 
such variation the latency criterion was 
quite inadequate. The latency procedure 
had been developed at the University of 
Iowa where a ready signal was always em- 
ployed in eyeblink conditioning experi- 
ments. Such experiments at the University 
of Wisconsin typically did not employ a 
ready signal (e.g., Grant & Schipper, 1952). 
Following his move to the University of 
Wisconsin, Ross collaborated with Hartman 
in seeking the answer to the question of 
whether the latency criterion worked as 
well without the ready signal as it did with 
it. Their answer was that it did not. The re- 
sults of their experiment showed that V re- 
sponses without a ready signal occurred 
with longer and more variable latencies 
than the V responses reported in the Iowa 
studies. Hartman and Ross argued that this 
was entirely reasonable since, as in a re- 
action-time experiment, a ready signal 
would permit S to be "set" and therefore to 
make his V responses faster. Without a 
ready signal, responses would vary in la- 
tency and more of them would oecur with 
latencies longer than 300 msec. 

Hartman and Ross proposed an alterna- 
tive discard procedure. Noting that the 
judgments of response types were based on 
the shape or form of the response, they 
proposed that an objective measure of the 
rate at which the eye closed be used in dif- 
ferentiating V from C responses. Specifi- 
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cally, they proposed that any S be discarded 
whose median relative response slope was 
greater than 40%. The relative slope of any 
anticipatory response was defined as its 
maximum slope divided by the mean maxi- 
mum slope of that S’s UCRs on the first five 
trials. Hartman and Ross went on to show 
that if Ss were eliminated by this slope 
criterion the data were purified of judged 
V responses to about the same extent as had 
been true in the Spence and Ross experi- 
ment with the latency criterion. 

In summary, work to date has made 
available two objective criteria for elimi- 
nating V responses from eyeblink condi- 
tioning data. One, used in experiments with 
a ready signal, employs the latency of the 
response, The other, used in experiments 
without a ready signal, employs the slope 
of the response. Several important questions 
have not been answered. What is the rela- 
tion between the slope and the latency of 
the response? The presence or absence of 
a ready signal apparently affects response 
latency. Does it also affect response slope? 
Are slope and latency equivalent bases of 
classification when a ready signal is em- 
ployed? 


The Present Paper 


In the first part of the present paper a 
detailed examination is made of standard 
conditioning data, obtained with a ready 
signal, in terms of both the conventional 
lateney and slope criteria for eliminating V 
responses. This analysis indieates certain 
discrepancies between the results of ap- 
plying the latency and slope criteria. These 
discrepancies lead to an attempt to isolate 
experimentally pure cases of V and C re- 
sponses and Ss in order to derive meaningful 
criteria. Following a reanalysis of the con- 
ditioning data in terms of some new pro- 
cedures, the current status of eyeblink 
conditioning is evaluated in light of the 
available methodology for identifying vol- 
untary processes. 

The results of two experiments are re- 
ported. The second experiment in large 
measure constitutes a replication of the 
first. Included with the data from the first 
experiment are data from a study conducted 


for quite a different purpose by Harold 
Fishbein as part of a doctoral dissertation 
(Fishbein, 1963). The analyses of Fish- 
bein’s experiment which are reported here 
were not a part of his thesis but were per- 
formed later on data generously supplied 
by Fishbein. The results of the two experi- 
ments will be presented separately but con- 
currently. In this way certain differences 
will be seen which provide some informa- 
tion about the effects of sampling and minor 
procedural changes, as well as some impor- 
tant invariances. 


MetHop 


Apparatus 


The S was seated in a dental-type chair within 
a sound-insulating cubicle. Air conditioning fans 
produced an ambient noise level in S's cubicle of 
60 db. (re 0002 dynes/em*). The E was located in 
another room with the apparatus for controlling 
stimuli and recording responses. 

Each S received 60 trials which occurred with 
intertrial intervals of 15, 20, and 25 sec. in an 
irregular sequence. A set of electronie interval 
timers controlled the stimulus presentations on 
each trial. The first event on any trial was a 
ready signal which consisted of a 400-msec. burst, of 
white noise (80 db.) from a speaker over S's 
head. Following a 2-, 3-, or 4-sec. interval, a visual 
CS was presented for 550 msec. on a display panel 
in front of S. The CS was produced by lighting & 
neon lamp (GE NE40) behind a 19.5 X 195 in. 
white translucent screen, causing a change in 
luminance of  2.5-in. diameter circular area on the 
screen from a background level of 0.03 mL. to a 
stimulus level of 1.73 mL. In Fishbein's experiment 
and in Experiment 1, the CS appeared as an illumi- 
nated spot on a plain field. In Fishbein's experi- 
ment the spot occurred in the center of the field; 
in Experiment 1 it appeared either at the left or 
at the right in a balanced sequence (cf. Goodrich, 
1964b). In Experiment 2, the CS appeared as the 
illumination of a circular portion of screen defined 
by a 2.5-in diameter cutout in the center of a black 
ground which masked the remainder of the screen? 

The UCS was a 50-msec. puff of air which 
arrived at S's eye 500 msec. after the onset of the 
CS. To encourage voluntary responding, its inten- 
sity was set at the rather high level of 5 psi 
measured at the air tank some 15 ft. from S's eye.” 


? Results reported elsewhere by the author 
(Goodrich, 1964b) make it unlikely that differences 
in the results among the subexperiments here were 
the result of differences in the CS display. i 

*'That more voluntary responding occurs with 
stronger air intensities was assumed by Spence 
and Ross (1959) and is generally consistent with 
the findings of Gormezano and Moore (1962). 


4 


VOLUNTARY AND NONVOLUNTARY EYEBLINKS 5 


The air was conducted through %¢-in. copper and 
glass tubing and was released with a solenoid valve 
in the line some 8 ft. from S's eye. The nozzle was 
positioned so as to direct the air at the corner of 
S's right cornea from a position just clear of the 
lashes. 

Responses were recorded with a system which 
employed a microtorque potentiometer coupled to 
the lid of S's right eye with a light thread, a 
plastic “false eyelash,” and a strip of adhesive tape. 
Voltage changes produced by resistance changes 
in the potentiometer were amplified and recorded 
on a Brush oscillograph in the manner described 
by Goodrich, Markowitz, and Norman (1964). 

Before running each S the relation between 
stimulus events and the corresponding polygraph 
tracings was checked. Also checked was the calibra- 
tion of the interval between light onset and 
arrival of the air puff at S's eye. 


The Experimental Conditions 


The several experimental treatment condi- 
tions were differentiated largely by the instruc- 
tions to S concerning the task. These conditions 
were as follows: 

Conditioning. The instructions for Ss in this 
condition were essentially those used in the Iowa 
laboratory. Briefly, they informed S that E was 
concerned with studying "attention and readiness," 
that this was achieved by observing the reaction 
of S's eyes to stimulation, that a light stimulus 
would come on and a puff of air would be 
delivered to the eye, that a ready signal would 
occur and was to be followed immediately by a 
voluntary blink, and that S was not to try to 
control the responses of his eye to stimulation but 
let the responses of his eye “take care of them- 
selves." Finally (as if this were possible), S was 
asked to relax and think about other things. 

Instructed-inhibit. 'The Ss in this condition 
received essentially the same instructions as just 
described with the very important difference that 
they were told that their task was not to blink 
while the light (CS) was on. They were told that 
following each trial they would be informed, by 
the onset of one of two lights beneath the stimulus 
display panel, whether they had succeeded in not 
blinking. In addition, they received a reward for 
each success ($01 in Fishbein’s experiment, $.05 
in Experiment 2) and a punishment for each 
failure (loss of $.01 in Fishbein’s experiment and 
of $05 in Experiment 2). The Ss in Experiment 2 
started with a credit of $2.00; those in Fishbein’s 
experiment started with $1.00. 

Instructed-blink. The Ss in this condition also 
received the basic instructions, with two important 
differences, First, these Ss were instructed to blink 
“while the light is on.” Second, in one subcondition 
of the Instructed-blink condition the UCS was not 
employed. For these Ss, the nozzle was adjusted 
to the eye but no puff was ever delivered during 
the experiment and the puff was not mentioned 
in the instructions. The Ss in the remaining sub- 


condition did receive the puff and the appropriate 
puff instructions. 


General Procedures 


The S was taken into the sound cubicle and the 
headpiece was adjusted with no instructions and a 
minimum of informal conversation. Although E 
made no attempt to be austere or to frighten S, 
the experience probably was unsettling for most 
Ss, as Spence (1964) has suggested. The instruc- 
tions were then read and S was requested to ex- 
plain to E what he was to do and what would 
happen. The E reread or paraphrased portions of 
the instructions if S appeared not to understand. 
During the experiment proper, no questions con- 
cerning the purpose of the experiment were an- 
swered. The E entered S's room only on the rare 
occasions when the polygraph record indicated that 
the linkage between S's lid and the potentiometer 
was out of adjustment. At the end of the experi- 
ment, E attempted to get S to talk briefly about 
what S had been trying to do and what he thought 
the experiment was about. Beyond reporting that 
a wide variety of answers was given to this ques- 
tioning, ranging from apparent failure to note 
anything during the experiment to a rather precise 
analysis using the terminology of conditioning, no 
further use will be made here of this rather un- 
systematic and incompletely recorded informa- 
tion. ó 
Analysis of Data i 

The raw data consisted of polygraph 
tracings. Several sample recordings are re- 
produced in Figure 1. Each of the 12 panels 
shows a single trial, beginning just prior to 
CS onset and ending about 100 msec. fol- 
lowing UCS onset. An anticipatory response 
was defined as the occurrence of a down- 
ward deflection of the recording pen at 
least 1 mm. below the eye-open baseline in 
the 500-msec. interval following the onset 
of the CS. Since the recording gain was 
calibrated to yield one-to-one recording, 


“The writer has informally compared two, ways 
of questioning S at the termination of an experi- 
mental session. In the first, 8 is asked in the usual 
way what he thought the experiment was about 
and what he was trying to do. In the second, S is 
told that his data will be of no value unless he 
knows what the experiment is about. The second 
mode of questioning seems to result in a very high 
incidence of Ss who describe rather accurately the 
classical conditioning nature of the experiment. 
Although these observations were not as sys- 
tematic as one could wish, and although 5 may 
figure out the experiment only after it is over 
under the second method of questioning, the 
writer believes that Ss normally know far more 
than they admit to knowing. 
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Fig. 1. Sample records of a variety of response patterns. Each record is a polygraph 
tracing from one trial. 


the response criterion corresponded to a 
lid movement of 1 mm. Records B and K 
in Figure 1 show anticipatory responses 
with amplitudes of approximately 1 mm. 

Two scores were derived from the poly- 
graph tracings: (a) the latency of each 
anticipatory response was measured in 
terms of the number of millimeters sepa- 
rating CS onset from initiation of pen move- 
ment on the record (8.1 msec. per milli- 
meter); (b) a measure of response slope 
was also recorded. In pilot work, the de- 
termination of maximum slope with a pro- 
tractor and a table of tangents proved to be 
an extremely tedious task. (Hartman and 
Ross had used an electronic differentiator 
which provided an oscillograph tracing the 
amplitude of which was proportional to 
slope.) As an alternative, a measure was 
defined which could be obtained at the 
same time as the measure of latency with a 
transparent scoring template placed over 
the polygraph record. This measure was 
the distance the eye had closed 50 msec. 
following the initiation of the blink. 

In the case of brief responses which had 
clearly reached a maximum excursion prior 
to 50 msec. after their initiation, a linear 
extrapolation out to 50 msec. was made 
using as references the point of initiation 
and the point at which the maximum excur- 
sion had been achieved. In Figure 1, re- 


sponses such as those in Records B and 
K, if at least 1 mm. in amplitude, would 
have been treated in the manner just de- 
scribed. For other shallow responses, such 
as the one shown in Record L, no extrapola- 
tion was necessary because the pen was 
still moving downward 50 msec. following 
response initiation. 

Following Hartman and Ross, the ex- 
cursion measure for each anticipatory re- 
sponse was expressed as a fraction of the 
measure of that S’s UCR in order to com- 
pensate for variability among Ss in the rate 
of typical eye closures. The measures of the 
UCR were obtained on two extra trials, 
Trials 61 and 62, with an air puff but no 
CS or additional instructions. If one or both 
of these UCRs was confounded with an 
anticipatory blink, as in the case of many 
of the sample records in Figure 1, one or 
both of the UCRs from Trials 1 and 2 were 
used, The measure of a complete UCR, an 
example of which is shown in Record A of 
Figure 1, was generally in the range of 12- 
15 mm. The measure «for anticipatory re- 
sponses ranged from about 1.5 mm. (re- 
sponses like those shown in Record K) to 
values similar to those of UCRs (responses 
like those in Records D and E). 

A scatter-plot is presented in Figure 2 
which relates relative excursion values (as 
defined above) on the ordinate to the rela- 
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tive tangents employed by Hartman and 
Ross on the abscissa for the responses of 
10 Ss from an independent experiment con- 
ducted by the writer. The correlation is 
clearly high and the proportionality con- 
stant not too different from unity. As a 
further check on the comparability of the 
present slope procedures with those of Hart- 
man and Ross (1961), a post hoe compari- 
son was made, using the data from Con- 
ditioning Ss in Experiment 2, of the UCR 
excursion measures based on Trials 61 and 
62 with excursion measures based on Trials 
1-5. The reason for this comparison is 
that Hartman and Ross had used Trials 
1-5 to obtain UCR slope values whereas 
Trials 61 and 62 were used in the present 
study. The mean excursion measures were 
12.8 mm. and 12.6 mm. for Trials 1-5 and 
Trials 61-62, respectively, and the standard 
deviation of the 29 differences was 1.4 mm. 
It may be concluded that biases were prob- 
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ably not introduced by using Trials 61 and 
62 instead of Trials 1-5. 

The term “slope” in this paper will refer 
to the relative excursion measure discussed 
above, and the comparability of this to the 
measure devised by Hartman and Ross will 
be assumed. 


Subjects 


All Ss were undergraduates at the Uni- 
versity of Pennsylvania. The data of 41 Ss 
were not used. In Experiment 1, 28 Ss were 
discarded because of procedural errors or 
apparatus failures, 3 because of failure to 
give at least one anticipatory response, and 
1 because of an excessive rate of random 
blinking. In Experiment 2, nine Ss were 
discarded, all because of procedural errors 
or apparatus failures, Fishbein (1963) re- 
ported no discarding of Ss. 

The remaining Ss were distributed among 
the experimental conditions as follows. 


10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 
RELATIVE MAXIMUM TANGENT - PER CENT 
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Fig. 2. Scatter plot pooled over 10 Ss in a separate conditioning experiment showing the covaria: 
of the porcis tangent and the relative excursion of anticipatory responses. 
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Thirty men served in Fishbein's Instructed- 
inhibit group. In Experiment 1, 42 men 
served under conditioning instructions, and 
78 men served under instructions to blink, 
36. with the UCS and 42 without the UCS. 
In Experiment 2, 52 Ss (34 men and 18 
women) served under instructions not to 
blink, 18 Ss (11 men and 7 women) served 
under instructions to blink (without the 
UCS), and 29 Ss (20 men and 9 women) 
served under conditioning instructions. 

The Ss in Experiment 1 “volunteered” to 
serve without pay as a course requirement. 
The Instructed-blink and Conditioning Ss 
in Experiment 2 volunteered to serve for a 
fee of $2.00. The Ss who served under 
instructions not to blink were volunteers 
who received a variable sum of money 
which depended on how well they succeeded 
in not blinking (as deseribed above). 


RESULTS 


Slope and Latency Analyses of Condition- 
ing Data 


Most of the analyses in the present re- 
port are based on frequency distributions 
of response latencies and slopes. Many of 
these distributions were compiled by com- 
bining data over both Ss and trials, a 
procedure generally followed in earlier 
work. Clearly, errors from averaging may 
be introduced by this procedure. Nonethe- 
less, for individual Ss and small blocks of 
trials there were insufficient numbers of 
responses to make reliable inferences about 
the shapes of frequency distributions. In a 
later part of this report there will be oc- 
casion to comment on the relation in the 
present data between individual and 
grouped distributions. 

Distributions for Trials 1-80 and Trials 
31-60. Figure 3 presents the frequency 
distributions of response latency for Ss 
run under conditioning instructions in the 
two experiments. To show how these dis- 
tributions changed with trials, the distribu- 
tions were plotted separately for Trials 
1-30 and Trials 31-60. Between the first 
half and the second half of the training ses- 
sions there was a decrease in the frequency 
of responses with latency less than about 
150 msec. and an increase in the frequency 


80} EXPERIMENT | 
TOF e—e TRIALS 1-30 À 
60 oo TRIALS 31-60 


e] 
o 


N ^ vi 


ow B Y 
8 


NUMBER OF RESPONSES 
o 5 


m 


100 200 300 400 
LATENCY (MILLISECONDS) 


Fia. 3. Distributions of response latencies for 
the Conditioning Ss in Experiment 1 (N = 42) 
and Experiment 2 (N — 29) plotted separately for 
Trials 1-30 and Trials 31-60. Numbers in paren- 
theses in this and succeeding figures are numbers 
of responses in distributions. 


of responses with latency greater than about 
200 msec. During the second half of the ses- 
sion the principal mode of the latency dis- 
tributions occurred somewhere after 400 
msec., and there was a clear skewness to the 
left. The secondary mode, occurring in the 
region of around 120 msec., presumably re- 
sulted from responses occurring as uncon- 
ditioned reactions to the CS. This mode had 
largely disappeared by the second half of 
the training session. 

The properties just presented describe the 
results of both Experiment 1 and Experi- 
ment 2, and are similar to those reported 
by other investigators. In one respect the 
results of the two experiments seem to dif- 
fer. In Experiment 1 there was a single 
major mode in the region after about 150 
msec. In Experiment 2, on the other hand, 
there apparently were two modes in this 
region, one occurring in the vicinity of 250 
msec. and the other in the region of the cor- 
responding mode in Experiment 1. The la- 
tency value previously used to differentiate 
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V from C responses was 300 msec., a point 
which fell roughly between the two modes 
in the results of Experiment 2 but had no 
apparent significance with respect to the 
distribution in Experiment 1. 

In Figure 4 are presented the slope dis- 
tributions for the two experiments plotted 
separately for Trials 1-30 and Trials 31-60. 
Here may be noted a major mode in both 
experiments in the region of 15% relative 
slope, and a marked skewness to the right. 
The shape of these distributions did not 
appear to change systematically with the 
addition of more trials. The slope value 
used to differentiate V from C responses in 
previous work was 40%. Only in Experi- 
ment 2 was there any suggestion of a pos- 
sible bimodality about the 40% point. 

Distributions of V and C responses. The 
application of the conventional latency and 
slope criteria for differentiating V and C 
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Fre. 4. Distributions of response slopes for the 
Conditioning Ss in Experiment 1 (N = 42) and 
Experiment 2 (N = 29) plotted separately for 
"Trials 1-30 and 31-60. Data include only responses 
with latencies greater than 150 msec. 
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responses is shown in Figures 5 and 6. To 
minimize the effect of the unconditioned re- 
sponses to the CS, the data in these figures 
are based on only the last half of the train- 
ing session.5 Figure 5 presents the frequency 
distributions of response latencies for re- 
sponses identified as voluntary and con- 
ditioned according to the 40% slope cri- 
terion. Several features of the data in Figure 
5 are consistent with the previous work by 
Spence and Ross (1959) and Hartman and 
Ross (1961). In particular, we note the oc- 
currence of more C-slope then V-slope re- 
sponses in the portion of the distribution 
beyond 300 msec. and more V-slope than 
C-slope responses prior to 300 msec. The 
latter effect is most marked in the data from 
Experiment 2. These distributions are not 


5We may note in passing a confirmation of 
other work (Goodrich, 19642) showing that the 
original responses to the CS are predominately 
shallow in slope. Thus they resemble in this respect 
the C response rather than the V response or the 
UCR. 
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Conditioning Ss in Experiment 1 (N = 42) and 
Experiment 2 (N = 29). Data were obtained from 
Trials 31-60 and include only responses with 
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precisely comparable to the latency dis- 
tributions presented by Spence and Ross 
and by Hartman and Ross, since in the 
work of these investigators responses were 
differentiated on the basis of judged cate- 
gories whereas in the present study they 
were differentiated on the basis of meas- 
ured response slope. Hartman and Ross, 
however, demonstrated good agreement be- 
tween judged form and measured slope. 
The presence of a mode for V-slope re- 
sponses in the vicinity of 200 msec. in Fig- 
ure 5 is consistent with the latency distribu- 
tions reported by Spence and Ross in which 
responses judged to be voluntary had a 
mode between 200 and 300 msec. Nonethe- 
less, the distributions shown in Figure 5 
contrast in an important respect with the 
data of Spence and Ross. A sizable propor- 
tion of the V-slope responses occurred be- 
yond 300 msec. in the present experiments 


(69% and 55%), whereas a much smaller 
proportion occurred in this region in the 
data of Spence and Ross (less than 17%). 
In this respect, the present results resemble 
more closely the results reported by Hart- 
man and Ross than those reported by 
Spence and Ross, in spite of the fact that 
the latter employed a ready signal as in 
the present study. Several other sets of data 
obtained by the writer have consistently 
shown a sizable proportion of V-slope re- 
sponses in the area beyond 300 msec. Al- 
though it has recently been confirmed that 
a ready signal produces more V-slope re- 
sponses with short latencies than does no 
ready signal (Goodrich, 1964a), average 
latency apparently is not invariant with 
other, unknown, variations among experi- 
ments. 

Frequency distributions of response 
slopes are presented in Figure 6 for re- 
sponses identified as voluntary and condi- 
tioned on the basis of the 300 msec. latency 
criterion. The great majority of responses 
in both experiments were identified as con- 
ditioned rather than voluntary according 
to this criterion. These C-latency responses 
occurred with greatest frequency in the re- 
gion of rather shallow slopes and decreased 
in frequency to produce a marked skewness 
toward the right. In contrast, the smaller 
number of responses identified as voluntary 
had a slope distribution of quite a different 
shape. If any mode may be identified, it 
clearly lies toward the steep end of the 
slope continuum rather than toward the 
shallow end. This is particularly clear in 
the case of Experiment 2 where a mode oc- 
curred around 75% slope. Examination of 
these data showed that the responses which 
produced the mode at 75% in the slope 
distribution were generally the same re- 
sponses which produced the mode at 200 
msec, in the latency distribution. For un- 
known reasons, this phenomenon was absent 
in Experiment 1; only Experiment 2 pro- 
duced a sizable group of short-latency, 
steep-slope responses. f 

Relation between slope and latency crm- 
teria. It is clear from the preceding two 
figures that although in several respects our 
distributions of slope and latency are simi- 
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lar to those previously reported, there are 
a disturbing number of instances in which 
responses meet only one of the two conven- 
tional criteria and not both. These findings 
are summarized in Figure 7, which presents 
the relative number of responses in the four 
possible categories formed by jointly classi- 
fying responses according to both latency 
and slope. Approximately 65% of the re- 
sponses were categorized alike by the two 
criteria; 35% of the responses thus met one 
but not both of the criteria. The more 
numerous type of inconsistent classification 
was that involving V slope and C latency. 
This finding reflects the same fact referred 
to above in our examination of the latency 
and slope distributions: the presence of a 
large number of responses which had slopes 
greater than 40% and latencies longer than 
300 msec. 

When the conventional criterion for iden- 
tifying voluntary Ss (those with at least 
50% V responses) was applied, the results 
resembled those in Figure 7. Several in- 
stances were found in which Ss met either 
the slope or latency criterion, but not both. 
Approximately 75% of the Ss were cate- 
gorized alike by the two criteria; 25% were 
inconsistently classified. In all cases, the 
latter Ss were voluntary by the slope cri- 
terion but not by the latency criterion. 

Effect of discarding V Ss. The data pre- 
sented in Figure 7 were based on the entire 
set of responses in each of the two experi- 
ments. We turn now to an examination of 
the effects upon the relative frequency of 
the four kinds of responses as Ss were elim- 
inated who failed to satisfy various criteria 
for inclusion in the sample. Figures 8 and 
9 summarize the relevant data. Figure 8 
represents the effects of discarding Ss on 
the basis of response latency. In this case 
an S would be discarded from the experi- 
ment if a certain proportion of his response 
latencies fell in the region below 300 msec. 
The right end of each abscissa in Figure 8 
corresponds to the stiffest of all criteria. At 
this point all Ss would be discarded who 
had 0% or more V-latency responses. Mov- 
ing to the left on the abscissa the discard 
criteria become successively more lenient, 
until at the far left end of the scale an S 
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Fig. 7. Proportions of responses falling in the 
four categories formed by jointly classifying re- 
sponses by the conventional slope and latency 
criteria. Data were obtained from the Condition- 
ing Ss in Experiment 1 (N = 42) and Experiment 
2 (N = 29) over Trials 1-60. Only responses with 
latencies greater than 150 msec. were included, 
1235 in Experiment 1 and 745 in Experiment 2. 


would be discarded only if 100% of his 
responses were voluntary by the latency 
criterion, The top panels in Figure 8 reflect 
the gross effects of applying the various 
discard criteria. The filled circles show the 
proportion of total responses which would 
remain after application of the criteria on 
the abscissa. The open circles show the 
proportion of Ss which would remain, 
The lower panels in Figure 8 show the 
relative frequency of the four kinds of re- 
sponses in the samples which remain after 
applying each discard criterion.9 At the far 
left, where almost no Ss are discarded, we 
see proportions which are nearly identical 
to those previously presented in Figure 7. 
Moving to the right, it is apparent that as 
the discard criterion becomes more stringent 


*Spence and Ross (1959) and Hartman and 
Ross (1961), in analogous plots, presented the 
relative frequencies of V and C responses in the 
data which were eliminated by the various discard 
criteria. It would seem more important, however, 
to be concerned with the composition of the data 


which remain. 
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Fra. 8. Effects of discarding Conditioning Ss with the conventional latency procedure upon the 
amount and composition of data remaining. Data were obtained over Trials 1-60 and include only 
responses with latencies greater than 150 msec., 1235 in Experiment 1 and 745 in Experiment 2. 
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Fia. 9. Effects of discarding Conditioning Ss with the conventional slope procedure upon the amount 
and composition of data remaining. Data were obtained over Trials 1-60 and include only responses with 
latencies greater than 150 msec., 1235 in Experiment 1 and 745 in Experiment 2. 
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in both experiments the relative frequeney 
of responses which are classified as condi- 
tioned by both slope and latency increases, 
and the relative frequency of responses 
which are classified as voluntary by both 
slope and latency decreases. At the 50% 
discard criterion recommended by Spence 
and Ross and by Hartman and Ross, the 
relative frequency of C-slope, C-latency 
responses has been increased to approxi- 
mately 60%, and the relative frequency of 
V-slope, V-latency responses has been re- 
duced to approximately 6%. 

Of particular interest in the analysis of 
a plot such as that in Figure 8 are the rela- 
tive frequencies of responses not classified 
alike by the latency and slope criteria. For 
the data presented in Figure 8, the incidence 
of responses categorized as conditioned by 
slope but voluntary by latency was origi- 
nally low and not greatly affected by ap- 
plying successively more strict criteria. On 
the other hand, the incidence of responses 
classified as conditioned by latency and 
voluntary by slope was initially about 27%, 
and it is of interest to determine what hap- 
pens to this frequency as Ss are discarded. 
Looking at the dashed lines connecting 
filled circles in Figure 8, it is clear that not 
until we apply very strict criteria indeed 
does any appreciable diminution in the rela- 
tive frequency of such responses occur. To 
the extent that one wishes to rid the data 
of responses of this ambiguous type, the 
latency discard criterion leaves much to be 
desired. 

In Figure 9 are presented comparable 
data for discard criteria based on the slopes 
of responses. As before, at the right end of 
the abscissa all Ss would be discarded who 
had 0% or more responses with slopes 
steeper than 40%. At the left end of the 
abscissa only those would be discarded who 
had 100% responses with slopes greater 
than 40%. A gross comparison of the data 
in Figure 9 with the data in Figure 8 reveals 
at least two things. First, the cost in data 
thrown away of applying a given percentage 
discard criterion is greater with a slope 
criterion than with a latency criterion. For 
example, a 50% criterion value results in 
the discarding of some 40% of the Ss with 


the slope criterion and only some 20% of 
the Ss with a latency criterion. It also is 
apparent that with the greater cost comes a 
greater gain. Thus, with a 50% slope cri- 
terion the relative frequency of responses 
classified as voluntary by both slope and 
latency is reduced to below 3%. Even more 
important, the relative frequency of re- 
sponses classified as conditioned by both of 
the criteria is increased to better than 80%. 
These values of 3% and 80% are to be com- 
pared with the values of 6% and 60% pre- 
sented above in the case of the latency 
criterion. In addition, the ambiguous re- 
sponses classified as voluntary by slope but 
conditioned by latency show a clear decline 
as the discard criterion becomes more strict. 

It was shown by Hartman and Ross that 
the 300-msec. latency criterion and the 40% 
slope criterion were not equivalent bases for 
discarding Ss from eyeblink conditioning 
experiments in which a ready signal was not 
employed. It is apparent from the data just 
discussed that the conventional latency and 
slope criteria are also not equivalent bases 
for discarding Ss from at least some eye- 
blink conditioning experiments in which a 
ready signal is employed. As in the case of 
the Hartman and Ross data, the dis- 
crepancy between the two kinds of criteria 
arises because of the frequent occurrence of 
responses with steep slopes but with 
latencies considerably in excess of the origi- 
nal 300 msec. latency cutting point. 

Before going on to analyze the implica- 
tions of the discrepancies just noted, it may 
be of interest to look briefly at the learning 
curves generated by Ss classified as volun- 
tary or nonyoluntary by the conventional 
50% latency and slope criteria, These data 
are presented in Figure 10. The 27 Ss in Ex- 
periment 1 and 17 Ss in Experiment 2 who 
were classified as nonvoluntary by both 
criteria show a gradually increasing learn- 
ing curve beginning at approximately 12% 
conditioned responses and increasing to over 
60% in Experiment 1 and about 55% in 
Experiment 2. Consistent with earlier re- 
ports (Goodrich, Markowitz, & Wall, 1963; 
Spence & Ross, 1959) that voluntary Ss 
began at higher levels and conditioned to 
higher levels, the four Ss in Experiment 1 
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sponses in blocks of 10 trials for Ss in both ex- 
periments falling in the categories formed by 
jointly elassifying Ss by the conventional slope 
and latency criteria. 


and five in Experiment 2 who were classified 
as voluntary by both criteria demonstrated 
average learning curves consistently above 
the eurves for the nonvoluntary Ss. The 
remaining curve in each plot represents the 
11 Ss in Experiment 1 and the 7 in Ex- 
periment 2 who were classified as voluntary 
on the basis of their response slopes but as 
nonvoluntary on the basis of their latencies. 
It will be seen that although the relative 
position of the curves for these Ss is not 
precisely the same in the two experiments, 
it falls between the other two functions. 
The intermediate position of these func- 
tions offers no clue as to the correct place- 
ment of these Ss in the voluntary or non- 
voluntary categories. 


Instructions to Blink and Not Blink 


The results of applying both latency and 
slope criteria to conditioning data were any- 
thing but completely satisfactory. Previous 
workers had evaluated the efficieney of the 
two criteria by determining to what extent 
they discriminated between responses 
judged to be either voluntary or condi- 
tioned. It was natural, then, that when con- 
fronted with the discrepancy between the 
two criteria discussed above, we should 
consider making judgments of the responses 
in our data. This required, of course, that 
we carefully study the judgmental criteria 
as listed by Spence and Ross. According to 
their report, in order to qualify as a V re- 
sponse an eyelid closure had to be sharp, 


smooth, complete, and maintained until 
after the air puff. The question arises as to 
the status of responses for which some of | 
these four characteristics are true and. 
others not. Aecording to Spence and Ross, 
“Any response not meeting the voluntary 
form criteria was considered to be a CR 
[Spence & Ross, 1959, p. 378].” The two 
judges in the Spence and Ross experiment 
were able to agree fairly well on which 
responses were to be called voluntary, which 
conditioned, and which ambiguous, but 
each of the two judges placed approxi- 
mately 20% of the responses in the ques- ` 
tionable or ambiguous eategory. Apparently 
it was often difficult to decide whether a 
response was sharp, smooth, complete, and 
maintained. This difficulty became very 
clear when we attempted to apply the judg- 
mental criteria to our data. A rather large 
number of responses appeared to be un- 
classifiable in terms of the judgmental eri- 
teria as we understood them. Moreover, it 
seemed clear that after applying the eri- 
teria we could not be sure that any differ- 
ences between our results and those of 
Spence and Ross could be attributed to true 
differences in the data. They might just as. 
well be attributable to differences in inter- 
pretation of the stated bases for making 
judgments. It was apparent that in a very 
real sense we did not know what we were 
doing. The results of applying the lateney 
and slope criteria had led us to suspect that 
we were not sure these criteria were doing 
what they were supposed to do, and then 
we had discovered that we were not even 
sure what they were supposed to do. 

To eliminate V responses and Ss, we must 
be able to identify such responses and $5. 
The data discussed above were not con- 
sistent with the assumption that the con- 
ventional criteria were adequate. Thus it. 
was decided to repeat and extend the kind | 
of work which Spence and Taylor (1951) 
and Spence and Ross (1959) previously had 
reported rather casually. That is, we would 
seek examples of V and C responses and | 
determine their differentiating descriptive — 
characteristics, if any. ; 

To obtain a pool of responses which could 
reasonably be regarded as voluntary, SS 
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were instructed to blink while the CS was 
being presented. For such Instructed-blink 
Ss, or at least for the subgroup of such Ss 
who received no air puff following the CS, 
it is reasonable to assume that no con- 
ditioning can occur.’ To obtain a pool of 
responses which could reasonably be re- 
garded as conditioned, other Ss were in- 
structed not to blink while the CS was on. 
Although probably less reasonable than the 
analogous assumption with respect to the 
Instructed-blink Ss, it was assumed that 
the responses provided by the Instructed- 
inhibit Ss were nonvoluntary and repre- 
sentative of C responses. The plan, then, 
was to examine closely the slope and latency 
characteristies of the responses obtained 
under the two instruction conditions to 
determine descriptive characteristics dis- 
tinguishing V from C responses. 

Learning curves. Before turning to the 
latency and slope characteristics of the re- 
sponses given by the two kinds of Ss just 
described, let us look briefly at the effect of 
the two kinds of instructions on the overall 
level of responding. Figure 11 contains the 
mean proportion of anticipatory responses 
in blocks of 10 trials for each of the six 
different conditions employed in the experi- 
ment. The right-hand panel shows conven- 
tional learning curves for the Ss run under 
standard conditioning instructions. At the 
top of the left-hand panel we see the cor- 
responding data for Ss run under instruc- 
tions to blink. Clearly, these Ss cooperated 
in complying with the instructions; during 
most of the session they responded on about 
95% of the trials. In marked contrast to 
both these data and the data from the Ss 
run under standard conditioning instruc- 
tions are the data in the lower left-hand 
portion of Figure 11 for the Ss instructed 
not to blink. Clearly, these Ss were able to 


"It has been brought to the writer's attention 
that Peak (1933) reported an unpublished study 
showing that for some Ss instructed to blink 
"...the secondary winking response became so 
automatic and habitual that even under conditions 
of...‘no effort'...secondary reactions frequently 
occurred [p. 82].” Aside from our inability to 
evaluate the reliability of this finding, there is no 
information as to the nature of the obtained reac- 
tions. In any case, they did not result from CS- 
UCS pairings. 
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Fic. 11. Mean proportions of anticipatory re- 
sponses in blocks of 10 trials for Ss in the several 
instruetion conditions. At the right are the Con- 
ditioning Ss, 42 in Experiment 1 and 29 in Experi- 
ment 2. At the lower left are the Instructed-in- 
hibit Ss, 30 in Fishbein's experiment and 52 in 
Experiment 2. At the upper left are the Instructed- 
blink Ss, 36 with UCS and 42 without the UCS in 
Experiment 1, and 18 without the UCS in Experi- 
ment 2. 


inhibit responding to a considerable ex- 
tent; even at the termination of 60 trials 
the group funetions had not risen above 
2095 conditioned responses. The two curves, 
one from Fishbein's experiment and one 
from Experiment 2 are in close agreement. 
Their relative positions may be explained 
by the fact that a 2-psi air puff was em- 
ployed in Fishbein's work, whereas a 5-psi 
puff was employed in Experiment 2. Presum- 
ably, responding based on a 5-psi puff would 
be more difficult to inhibit. 'The finding that 
instructions not to blink resulted in a re- 
sponse level markedly lower than that 
produced by standard conditioning instruc- 
tions is consistent with earlier work on the 
effects of inhibitory instructions (e.g, 
Norris & Grant, 1948). 

Latency and slope distributions. Figures 
12 through 15 contain latency and slope dis- 
tributions for the various instruction con- 
ditions. Figure 12 presents the frequency 
distributions of latency for the Ss who 
received instructions to blink while the CS 
was on. The functions for the three groups 
of Ss were very much alike, and the latency 
characteristics apparently changed very 
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little between the first and second halves of 
the experiment. The mode of these distribu- 
tions fell between 200 and 300 msec., con- 
sistent with other data obtained under simi- 
lar instructions (Gormezano & Moore, 1962; 
Hartman, Grant, & Ross, 1960) and con- 
sistent with earlier work by Spence and 
Ross in which voluntary form responses 
obtained during conditioning occurred with 
a mode in this region. A comparison of the 
top two panels in Figure 12 shows that the 
presence or absence of the UCS apparently 
had little effect upon the latency distribu- 
tion. The possible exception to this state- 
ment is the minor mode occurring in the 
vicinity of 125 msec. in the top panel under 
conditions in which the UCS was presented 
on each trial. This small distribution pre- 
sumably represents unconditioned responses 
to the CS. The fact that such a distribution 
was more prominent when an air puff was 
presented may be explained by a sensitizing 
effect of the air puff. 

Figure 13 presents the frequency dis- 
tributions of response slopes for the same 
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Fia. 12. Distributions of response latencies for 
the Instructed-blink Ss in the two experiments 
at two stages in training. The numbers of Ss in the 
three panels are, from the top, 36, 42, and 18. 
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Fia. 13. Distributions of response slopes for the 
Instructed-blink Ss in the two experiments at two 
stages in training. Data include only responses with 
latencies greater than 150 msec. Numbers of Ss 
in the three panels are, from the top, 36, 42, and 
18. 


three experimental conditions. The distribu- 
tions in the top two panels, which represent 
the two Instructed-blink conditions from 
Experiment 1, are remarkably alike.’ They 


*Tt was noted that among the Instructed-blink 
Ss, those who received the UCS tended to have 
their eyes at least partially closed when the puff 
arrived. In contrast, those Ss who did not receive 
the UCS usually had their eyes open completely 
at the time a UCS would have arrived. Thus 
although the slope and latency distributions, as 
well as the overall frequency of responding, were 
very much alike for these subgroups, the "dura- 
tion" of the response was different. This independ- 
ence of duration from slope and latency raises the 
possibility that duration of V responses may not 
be a fruitful general basis for identifying such 
responses. As we have seen, latency distributions 
of presumed V responses have also proved labile 
under several variables (Goodrich, 19642; Gorme- 
zano & Moore, 1962; Hartman, Grant, & Ross, 
1960). If the slope of Y responses should prove 
to be generally invariant, it would be the best 
candidate on these grounds for the role of identi- 
fying such responses. 
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both show a mode at approximately 75% 
slope, skewness to the left, and little sys- 
tematie change between the first half and 
the second half of the experiment. In gen- 
eral, they appear to be unimodal. The 
lower panel contains the slope distributions 
of the Ss in Experiment 2 run under In- 
structed-blink conditions. Unaccountably, 
these distributions appear to differ from 
the corresponding distributions in the upper 
two panels. The mode occurs in the vicinity 
of 55% slope instead of 75% slope, and there 
appears to be a shift between the first half 
and the second half of the experiment to- 
ward more shallow responses. 

Figures 14 and 15 present frequency dis- 
tributions of latency and slope for the Ss in 
Fishbein’s experiment and in Experiment 1 
who were instructed not to blink while the 
CS was on. Figure 14 shows that latencies 
under these instruction conditions occurred 
predominantly late in the CS-US interval 
with a mode beyond 400 msec. There was 
a remarkable absence of responding prior to 
300 msec., the point conventionally said to 
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Fio, 14. Distributions of response latencies for 
the Instructed-inhibit Ss in Fishbein’s experiment 
N = 30) and Experiment 2 (N = 52) at two 
Stages in training. 
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— 30) and Experiment 2 (N — 52) at two stages 
in training, Data in Experiment 2 include only 
responses with latencies greater than 150 msec. 
Fishbein’s data include all latencies. 


separate voluntary from conditioned re- 
sponding. An exception to this generaliza- 
tion is contained in the results from Experi- 
ment 2 where a large secondary mode, pre- 
sumably caused by unconditioned responses 
to the CS, occurred in the vicinity of 125 
msec. The relative absence of this mode in 
Fishbein’s experiment may be explained by 
his use of a relatively weaker air puff. As 
may be seen in Figure 15, the slopes of re- 
sponses which occur under instructions not 
to blink are predominately shallow with a 
mode in the vicinity of 10%. Although re- 
sponses with slopes less steep than 40% are 
in the majority, some responses occur with 
steeper slopes. arith 
The distributions just presented in Fig- 
ures 12-15 clearly do not suggest any un- 
ambiguous dichotomies for distinguishing 
V from C responses. The responses assumed 
io be voluntary, those obtained under in- 
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struetions to blink, are distributed rather 
widely over the range of possible latencies 
and the range of possible slopes. Only by 
choosing very short latencies or very steep 
slopes could regions be defined in which the 
great preponderance of V responses fall. 
The situation is somewhat better in the case 
of responses assumed to be nonvoluntary, 
those obtained under instructions not to 
blink. In this case, the responses tend to be 
more concentrated along both the latency 
and the slope scales. The important con- 
sideration, however, is whether by jointly 
considering Instructed-blink and In- 
structed-inhibit conditions one can choose 
cutting points on the latency or slope scale 
which would permit unambiguous classifi- 
cation of responses as voluntary or condi- 
tioned. It is apparent from examining the 
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Fic, 16. Cumulative relative frequencies of re- 
sponse latencies for Instructed-blink (no UCS) and 
Instructed-inhibit Ss. Data were obtained from 
Trials 1-60 (Trials 1-100 for Fishbein’s data) 
and include only responses with latencies greater 
than 150 msec. The total number of responses in 
the Instrueted-blink and Instructed-inhibit distri- 
butions were 2345 and 375, respectively, in the top 
panel, and 1016 and 363, respectively, in the 
bottom panel, i 


data in Figures 12-15 that no such unam- 
biguous eutting points exist. 

Error probabilities in classifying ře- 
sponses. We may illustrate the problem of 
determining cutting points by plotting on 
the same graphs the data from Instructed- 
blink and Instructed-inhibit Ss in such a 
way as to represent the probabilities of er- 
roneous classifications for each possible 
cutting point. Figure 16 is such a represen- 
tation for the latency data. We have seen 
that responses obtained under Instructed- 
blink conditions typically have shorter la- 
tencies than those obtained under In- 
structed-inhibit conditions. Therefore, we 
seek a latency cutting point such that re- 
sponses with shorter latency than that 
point will be called voluntary and those 
with longer latencies will be called condi- 
tioned. 

Ideally, the probability « of erroneously 
labeling as conditioned a response which 
really is voluntary would be zero, as would 
the probability 8 of erroneously labeling as 
voluntary a response which really is condi- 
tioned. It is clear from the data presented 
in Figure 16 that this ideal cannot be met 
in our data: for no point along the latency 
seale are the ordinates of the decreasing 
function, representing the probability «, 
and the increasing function, representing 
the probability 8, both zero. 

Because « and £ will both be nonzero, and 
because one probability tends to increase 
when the other decreases as we change the 
cutting point, it is clear that in choosing à 
cutting point we must consider the relative 
undesirability of the two kinds of error. Is 
it more undesirable to label as conditioned 
a true V response, or to label as voluntary 
a true C response? The first type of error 
means the inclusion of a response of a kind 
we do not want to study. The second type 
of error means the discarding of a datum 
which has cost both time and effort to ob- 
tain. Although it is clear that assigning 
weights to these two types of error is a dif- 
ficult task and that the results may be de- 
batable, it seems fairly clear to the writer 
that the first kind of error—including true 
V responses as C responses—must be re- 
garded as the more serious kind. It follows 
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that the eutting point should be chosen in 
such a way that « is smaller than 5. 

An examination of Figure 16 will show 
that the conventional latency cutting point 
of 300 msec. does not meet the requirements 
we have just argued. Not only is « not 
smaller than 8, but the magnitude of « is 
sizable. Only about 4% (Experiment 1) to 
1095 (Experiment 2) of the true C responses 
would erroneously be eliminated, whereas 
38% (Experiment 2) to 46% (Experiment 
1) of the true V responses would erroneously 
be retained. 

How shall a cutting point be determined? 

The “decision rule" which is generally 
adopted in testing statistical hypotheses in- 
volves controlling the probability of one 
kind of error at an arbitrary small value 
and letting the other error probability take 
on whatever value is dictated by the meth- 
odology and true state of affairs. This is 
most clearly a satisfactory procedure when 
the error whose probability is controllable 
is markedly more serious than the other er- 
TOT. 
. If we regard the error of including a true 
V response as markedly more serious than 
the error of discarding a true C response, 
we may follow a procedure similar to that 
in hypothesis testing. Let us set «, the prob- 
ability of including a V response, at some 
arbitrary but small value, say .05, and let 
B fall where it will. An analysis of the data 
in Figure 16, using both linear extrapola- 
tion and the curves fitted “by eye” to the 
data points in both experiments, reveals 
that a cutting point of about 435 msec. ob- 
tains for « = .05. The corresponding value 
of B falls between .50 and .60, a value which 
is unfortunately high but which the logic 
of our decision rule dictates we must toler- 
ate. In summary, then, if we assume that 
the instruction conditions have resulted in 
"true" C and “true” V responses, and that 
these are representative of such responses 
in conditioning situations, then the use of 
a latency cutting point of 435 msec. will 
result in including as C responses only about 
5% of the true V responses, and the simul- 
taneous elimination of about 55% of the 
true C responses. 

The same logic as that just used to de- 


velop a cutting point for response latency 
also applies to developing a cutting point 
for response slope. In Figure 17 are pre- 
sented the slope data in a form analogous 
to that of the latency data in Figure 16. In 
the ease of slope, we have already seen that 
responses obtained under Instructed-blink 
conditions typically have greater slopes 
than those obtained under Instructed-in- 
hibit conditions. Thus what is required is | 
a cutting point on the slope dimension such 
that responses with slopes greater than that 
value will be called V responses and those 
with slopes less than that value will be 
called C responses. It is clear from Figure 
17 that no cutting point exists such that 
both a, the probability of labeling as condi- 
tioned a true V response, and £, the prob- 
ability of labeling as voluntary a true C re- 
sponse, will be zero. As in the case of 
latency, we must deal with the fact that 
both « and 8 will generally be nonzero and 
that one will increase as the other decreases. 
Just as we examined the effect upon « 
and £ of the conventional latency cutting 
point, we may also examine the correspond- 
ing effect of the conventional eutting point 
of 4095 relative slope. Study of Figure 17 
will show that use of a 40% cutting point 
would lead to approximately equal « and 
B probabilities. The probability of erro- 
neously including a V response and the 
probability of erroneously discarding a C 
response are both about .18. Although these 
values are somewhat more reasonable than 
the corresponding values for the conven- 
tional latency cutting point, an a value 
as large as .18 is quite inconsistent with 
our argument that the cost of making this 
kind of error is markedly greater than the 
cost of making the other kind of error. 
The data in Figure 17 were analyzed to 
determine the slope cutting point which 
corresponded to an a value of .05. Using 
both linear extrapolation and the smoothed 
curve for the data of both experiments, we 
arrived at a cutting point in the region of 
20% slope. The corresponding value of 8 
was about .32—41. In summary, then, the 
present analysis suggests that the use of à 
slope cutting point of 20% will result in 
the inclusion in our data of only about 5% 
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Fic. 17. Cumulative relative frequencies of re- 
sponse slopes for Instructed-blink (no UCS) and 
Tnstructed-inhibit Ss. With the exception of Fish- 
bein's experiment, data were obtained from Trials 
1-60 and include only responses with latencies 
greater than 150 msec. Fishbein’s data were ob- 
tained from Trials 1-100 and include all latencies. 
Numbers of responses are the same as those in 
Figure 16 except that Fishbein’s data here con- 
sist of 392 responses, 


of the true V responses, and the simulta- 
neous elimination of about 36% of the true 
C responses. 


Reanalysis of Conditioning Data with New 
Response Definitions 


The analysis in the preceding section led 
to latency and slope cutting points which 
were quite different from the values which 
were recommended by previous research 
and which were employed in a previous 
section of this paper to analyze the rela- 
tion between slope and latency in condi- 
tioning data. Figure 18 shows for each ex- 
periment the total response pool broken 
down into the four response categories 
formed by applying simultaneously the new 
slope and latency cutting points. The agree- 
ment between experiments is excellent. In 
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both cases, more than half of all responses 
were classified as voluntary by both cri- 
teria, and fewer than 10% were classified as 
conditioned by both criteria. More responses 
were labeled voluntary than conditioned by 
both the slope and latency criteria, revers- 
ing the relation in the data presented earlier 
in Figure 7 for the conventional criteria. 
Whatever else the change in cutting points 
may have accomplished, it is clear that it 
did not eliminate the fact that slope and 
latency are not interchangeable ways of 
identifying V and C responses. 
Distributions of V and C responses. La- 
tency distributions for V and C responses 
defined by slope, and slope distributions for 
V and C responses defined by latency, were 
presented earlier in Figures 5 and 6 for the 
conventional cutting points. The corre- 
sponding distributions for the new cutting 
points will not be presented here because 
they do not take us much further than the 
summary data in Figure 18. As a result of 
basing the cutting points on considerations 
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Fic. 18. Proportions of responses falling in the 
four categories formed by jointly classifying re- 
sponses by the new slope and latency criteria. Data 
were obtained from the Conditioning Ss in Experi- 
ment 1 (N = 42) and Experiment 2 (N = 29) 
over Trials 1-60. Only responses with latencies 
greater than 150 msec. were included, 1235 in 
Experiment 1 and 745 in Experiment 2. 
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of errors in classification, we know that such 
distributions contain certain proportions of 
erroneously classified responses. However, 
because these proportions refer to the true, 
rather than the obtained, numbers of V and 
C responses, the obtained distributions are 
distorted in a fashion not easily corrected. 

Estimating the mumber of true V and C 
responses. Given our procedure for deter- 
mining the latency and slope cutting points 
and the assumption of stability in the val- 
ues of « and £ for different sets of data, it 
becomes possible to estimate for amy col- 
leetion of conditioning data the true num- 
bers of C and V responses. 

Recall that « is the probability of includ- 
ing a true V response and £ is the probabil- 
ity of discarding a true C response. Now let 
Co and Vo represent the obtained numbers 
of ineluded (C) responses and discarded 
(V) responses, respectively, and C and V, 
represent the true numbers of C and V re- 
sponses, respectively. Then Co = «V. + 
(1 — 8)C, and Vo = (1 — a) Ve + BCe. 

Because numerical values of « and arise 
from the procedures used in differentiating 
the responses into a retained set consisting 
of exactly Co responses and a discarded set 
consisting of exactly Vo responses, there are 
only two unknowns in the two equations 
above, C, and V,. Thus solutions may be 
found for C, and Vi. For any set of con- 
ditioning data, then, we not only can par- 
tition the total pool of responses into two 
sets, one of which theoretically contains all 
C responses except for 100«% of the true 
number of V responses, but we may also ob- 
tain an estimate of the true number of V 
(and C) responses. In Experiment 1, the es- 
timated proportion of all responses which 
were really V responses was .63 using the 
latency cutting point and .32 using the slope 
cutting point. In Experiment 2 the corre- 
sponding values were .56 and .35. It is ap- 
parent that the results of the two experi- 
ments are in good agreement, whereas the 
slope and latency procedures lead to rather 
disparate outcomes. 

Before gaining too much satisfaction 
from the procedures just described, we must 
recall from our earlier discussion that the 
discarding of responses per se is not gen- 


erally a meaningful tool of analysis in stud- 
ies of eyeblink conditioning. For similar 
reasons, the possibility of estimating how 
many of the responses obtained in an exper- 
iment are actually V responses will not be 
regarded as particularly useful, since it per- 
mits no way of obtaining an estimate of the 
probability of the occurrence of a C re- 
sponse unbiased by the occurrence of a V 
response. Nonetheless, it may be of some 
interest to use estimates of V. and C; as de- 
pendent variables in experiments which in- 
volve the manipulation of variables, such 
as UCS intensity, whieh presumably in- 
fluence the adoption by S of a voluntary 
mode of responding. 

In any case, it is possible to show that 
the equations above impose theoretical con- 
straints on any set of obtained data. 'These 
restraints provide a way of testing the pro- 
cedures and assumptions employed here. 
We note that the maximum number of re- 
sponses will be discarded when every re- 
sponse is really voluntary, and the mini- 
mum number of responses will be discarded 
when every response is really conditioned. 
If all responses were voluntary, then €, = 
0 so that Co = «Ve and Vo = (1 — a) Ve. 
Thus when all responses are voluntary, the 
relative frequency of discarded responses 
would be (1 — a). When some of the re- 
sponses are actually conditioned, the rela- 
tive frequency of discarded responses would 
be less than (1 — o). Similarly, if all of 
the responses were actually conditioned, 
then V, = 0 so that Co = (1 — 8) €x and 
Vo = BC+. The relative frequency of dis- 
carded responses under these conditions 
would be 8. To summarize these implica- 
tions of the present model, the smallest pos- 
sible relative frequency of discarded re- 
sponses will be £, obtained when all of the 
responses are actually conditioned, and the 
largest possible relative frequency of dis- 
carded responses will be (1 — a), obtained 
when all of the responses are actually vol- 
untary. x 

To illustrate a possible application of 
these deductions, let us presume that air- 
puff intensity is directly related to the 
adopting of the voluntary mode of respond- 
ing. We would predict from the present ar- 
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guments that no matter how weak the puff, 
no fewer than 1008% of the responses 
would be discarded as voluntary, and that 
no matter how intense the air puff no more 
than 100 (1 — «) percent would be so dis- 
carded. On the basis of our earlier findings, 
let us take « as .05 and £ as .55 for latency 
and .36 for slope, so that the limiting per- 
centages are 55% and 95% for latency and 
3695 and 9595 for slope. The two sets of 
conditioning data reported here provide re- 
sults which are consistent, with these limits: 
8095 of the responses in Experiment 1 and 
78% in Experiment 2 were discarded with 
the latency cutting point, and 68% and 69% 
with the slope cutting point. 

Predicting the obtained frequency distri- 
butions, The assumptions made in develop- 
ing the procedures described above suggest 
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Fic. 19. Predicted and obtained latency distri- 
butions of conditioning data. Predicted values 
were obtained from a slope-derived estimate of 
the relative frequency of V responses (see text). 
Obtained data are based on Trials 31-60 for 
Conditioning Ss and contain only responses with 
latencies greater than 150 msec., 801 in Experiment 
1 and 546 in Experiment 2. 


another possibility which can be used to 
test the adequacy of these assumptions. We 
have assumed that the Instructed-blink 
condition provides a picture of the distribu- 
tions of true V responses, and the In- 
structed-inhibit condition provides a pic- 
ture of the distributions of true C responses. 
In the preceding section we derived esti- 
mates of the relative frequency of true V 
responses in the conditioning data. Putting 
these relations together, it should be possi- 
ble to predict the obtained slope and latency 
distributions for the Conditioning groups 
by combining the appropriate distributions 
from Instructed-blink and Instructed-in- 
hibit conditions in the proportions dictated 
by the estimated relative frequency of true 
V and C responses in the data. 

Such derivations of the Conditioning data 
were carried out by first pooling across the 
two experiments the corresponding distri- 
butions for Instructed-blink Ss and for In- 
structed-inhibit Ss. Then the Instructed- 
blink and Instructed-inhibit distributions 
were combined in “correct” proportions to 
predict the Conditioning data. The “cor- 
rect” proportions for one set of results were 
obtained by using the estimate of the pro- 
portion of true V responses which resulted 
from dichotomizing responses with a la- 
tency cutting point. We saw in the previous 
section that this proportion was .63 for Ex- 
periment 1 and .56 for Experiment 2. For 
the present purposes, a value of .60 was 
employed. Thus the resulting synthesized 
distributions of slope and latency contained 
60% Instructed-blink responses and 40% 
Instrueted-inhibit responses. Similarly, a 
single estimated proportion of true V re- 
sponses, .33, was derived from the values in 
Experiment 1 (.32) and Experiment 2 (.35) 
which resulted from dichotomizing re- 
sponses with a slope cutting point. In this 
case the synthesized distributions of slope 
and latency contained 33% Instructed-blink 
responses and 67% Instructed-inhibit re- 
sponses. 

The results of these analyses are shown in 
Figures 19-22. Figure 19 shows the pre- 
dicted latency distributions in comparison 
with the actual obtained distributions for 
the Conditoning groups in the two experi- 
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ments. Both distributions have unit area. 
The predicted values contain 33% In- 
structed-blink responses. Analogous results 
are shown in Figure 20 for predicted values 
containing 60% Instructed-blink responses. 
It is apparent that the 33% value, which 
was determined from a slope cutting point, 
led to a more satisfactory fit to the obtained 
data in both experiments. The 60% value, 
determined from a latency cutting point, 
resulted in a poor fit caused by relatively 
too many predicted responses with short 
latency. 

Figures 21 and 22 present the same kind 
of data for distributions of response slope. 
The predicted distributions in Figure 21 
contain 33% Instrueted-blink responses, 
whereas those in Figure 22 contain 60% 
such responses. For Experiment 1 the 33% 
value, derived from using the slope cutting 
point, again produces the better fit of pre- 
dicted to obtained distributions. For Expe- 
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Fic. 20. Predicted and obtained latency distri- 
butions of conditioning data. Predicted values were 
obtained Írom a latency-derived estimate of the 
relative frequency of V responses (see text). Ob- 
tained data are identical with those in Figure 19. 
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Fic. 21. Predicted and obtained slope distribu- 
tions of conditioning data. Predicted values were 
obtained from a slope-derived estimate of the 
relative frequency of V responses (see text). Ob- 
tained data are based on Trials 31-60 for Condi- 
tioning Ss and contain only responses with latencies 
greater than 150 msec. 801 in Experiment 1 and 
546 in Experiment 2. 


riment 2 neither the 33% nor 60% values 
clearly leads to better prediction. 

The differences above between the 33% 
results and the 6096 results are not the first 
instances in the present report of disparate 
results of applying methods based on slope 
and latency. All of the cases we have en- 
countered have in common that many re- 
sponses are labeled as voluntary by one 
criterion and conditioned by the other. In 
the previous cases, there was no basis 
within the experiment for preferring one 
eutting point over the other. In the present 
instance, differences are again apparent be- 
tween outcomes based on slope and latency 
cutting points. This time, however, the dif- 
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Fia. 22. Predicted and obtained slope distribu- 
lions of conditioning data. Predicted values were 
obtained from a latency-derived estimate of the 
relative frequency of V responses (see text). Ob- 
tained data are identical with those in Figure 21. 


ferences pertain to predictive accuracy and 
thus provide some grounds for choosing one 
kind of eutting point over the other. Overall, 
the data in Figures 19-22 suggest that the 
more accurate synthesis of the conditioning 
data results from combining Instructed- 
blink and Instructed-inhibit distributions 
in the proportions dictated by a slope, 
rather than latency, cutting point. 

No attempt is made here to assess quan- 
titatively the fit between predicted and ob- 
tained distributions. The usual methods are 
inappropriate because the distributions are 
pooled over both Ss and responses within 
Ss, and the intent of presenting these data 
is to reveal possibilities and problems, not 
to provide precise tests. 

Effects of discarding V Ss. Entering into 
the preceding analyses have been the data 
from all Ss in the Conditioning groups. As 
we have already indicated, the practical 
value of identifying V responses is small 


unless this identification is used as a basis 
of discarding all the data from selected Ss 
in the experiment. The data in Figures 8 
and 9, discussed earlier, showed what hap- 
pened to the size and composition of the set 
of data remaining after discarding Ss as 
“voluntary responders.” The V and C re- 
sponses were defined by what we have called 
the conventional cutting points. We turn 
now to similar analyses carried out with the 
new cutting points determined by a con- 
sideration of errors in classification. 

Figure 23 shows the effects upon the data 
which remain after Ss are discarded for 
meeting successively more stringent la- 
tency discard criteria, Towards the left end 
of each abscissa are criteria which eliminate 
only those Ss with high relative frequencies 
of V-latency responses. Toward the right 
end large numbers of Ss are eliminated as 
more and more of them meet the require- 
ment of small proportions of V-latency re- 
sponses. Study of these data shows that no 
criterion seems to even moderately well ap- 
proximate the ideal state of affairs in which 
all but the responses labeled as conditioned 
by both slope and latency cutting points 
would be eliminated and a reasonable pro- 
portion of the data would be retained. As 
an illustration, we note that roughly half 
of the data has already been discarded with 
an 80% criterion, and that at that point the 
responses consistently labeled conditioned 
are still the smallest category and those con- 
sistently labeled voluntary are still present 
in large numbers. Even by tolerating un- 
reasonably large losses of data, the re- 
maining sample is only poorly rid of V re- 
sponses by a latency criterion for discarding 
Ss. 

Figure 24 shows the effects of discarding 
Ss on the basis of slope criteria. Comparison 
of the top portions of Figures 23 and 24 will 
show that for the same percentage discard 
criterion fewer Ss are eliminated with the 
slope criterion than with the latency cri 
terion. This finding reflects the fact, already 
pointed out, that the slope cutting point 
identifies fewer responses as voluntary than 
does the latency cutting point. Study of 
the data in the lower portion of Figure 24 
shows that discarding Ss by slope criteria 1$ 


25 


VOLUNTARY AND NOoNVOLUNTARY EYEBLINKS 


NUMBER OF SUBJECTS AND RESPONSES REMAINING 
EXPERIMENT | EXPERIMENT 2 T 


* RESPONSES M 


40} x 
20 
x COMPOSITION OF REMAINING SAMPLE 
IOOF EXPERIMENT | — $777 EXPERIMENT 2 
za [ 
Eo j i 
fa i * V-SLOPE 
i | ao 
2 H --C-LATENCY exis 
1 i 
i f 
/ X 
Pala ere 
E gd ES 
di a 


100 90 80 70 60 50 40 30 20 10 O 100 90 80 70 60 50 40 30 20 10 O 
SUBJECT DISCARD CRITERION: % V-LATENCY RESPONSES 
Fra. 23. Effects of discarding Conditioning Ss with the modified latency procedure upon the amount 
and composition of data remaining. Data were obtained over Trials 1-60 and include only responses 


with latencies greater than 150 msec., 1235 in Experiment 1 and 745 in Experiment 2. 


EXPERIMENT 2 


~~ C-LATENCY 


PERCENTAGE 
o o 


100 90 80 70 60 50 40 30 20 10 O 100 90 80 70 60 50 40 30 20 10 O 
SUBJECT DISCARD CRITERION: % V-SLOPE RESPONSES 
Fig, 24. Effects of discarding Conditioning Ss with the modified slope procedure upon the amount 
and pem due cu Data were obtained over Trials 1-60 and include only responses 
with latencies greater than 150 msec., 1235 in Experiment 1 and 745 in Experiment 2. 


26 KzwwErH P. GoopRICH 


somewhat more satisfactory than discard- 
ing them by latency, as judged by the com- 
position of the remaining data. Nonetheless, 
the slope criterion still leaves much to be 
desired. For example, half of the data are 
again lost when the criterion is no more se- 
vere than about 80%. At this point V-la- 
tency responses predominate over C-latency 
responses, and responses labeled consis- 
tently as voluntary greatly exceed in num- 
ber those labeled consistently as condi- 
tioned. The situation is only moderately 
improved by tolerating larger losses in data. 

Overall, neither the slope nor the latency 
criteria for discarding Ss is particularly suc- 
cessful at achieving what it was designed to 
do: to discard from conditioning experi- 
ments a reasonable number of Ss in order to 
rid the remaining data of V responses, de- 
fined by the new cutting points developed 
in this paper. The reader will recall that 
similar procedures were more satisfactory 
for the task of ridding the data of responses 
defined by the conventional cutting points. 
An improved method of classifying re- 
sponses seems to have led to a markedly 
lowered efficiency of the procedures on 
which many workers in the area of eye- 
blink conditioning have depended for elim- 
inating voluntary influences from their 
data. 


A Direct Method of Identifying V Ss. 


The methodology discussed so far has 
followed the earlier work (Hartman & 
Ross, 1961; Spence & Ross, 1959) by deal- 
ing with response distributions pooled over 
all Ss and responses. Although responses 
are the units within these distributions, the 
individual S, rather than the individual re- 
sponse, is the actual sampling unit. If the 
latencies or slopes of responses of individual 
Ss tend to cluster closely about average 
values, and the average values for different 
Ss are spread over the range of possible 
values, the adding or subtracting of a few 
Ss at random may have a marked effect on 
the shapes of the pooled response distribu- 
tions. 

The empirical question here is whether 
the shapes of obtained response distribu- 
tions arise in large measure from the distri- 


bution of Ss per se. Clearly, the question 
cannot be approached by asking about the 
shapes of distributions for individual Ss, 
because an S could make at most 60 re- 
sponses in the experiments reported here, 
The remaining approach involves looking at 
the way in which Ss themselves are distrib- 
uted. To this end, the median slope and 
median latency over all 60 trials were cal- 
culated for each S. These data for the Ss 
in the Conditioning groups are presented in 
Figure 25. In the lower right-hand corners 
of both the top and bottom sections of this 
figure are scatter plots showing the rela- 
tion between median slope and median la- 
tency, each point locating one S. In the up- 
per portion of each section is the marginal 
latency distribution; in the left portion is 
the marginal slope distribution. 

An examination of Figure 25 in conjunc- 
tion with Figures 3 and 4, discussed pre- 
viously, will show that the distributions of 
Ss’ medians have roughly the same shapes 
as the pooled response distributions. Con- 
sider, for example, the latency distributions 
for Experiment 2. Both the response and $ 
distributions are skewed to the left. More- 
over, the presence of a mode around 200 
msec. in the response distribution, which did 
not appear in Experiment 1, is paralleled 
(and probably explained) by the left-hand 
mode in the distribution of Ss’ medians. 

Figure 26 presents distributions of indi- 
vidual Ss along the latency and slope di- 
mensions for Ss run under the Instructed- 
inhibit and Instructed-blink conditions. It 
is apparent from both the scatter plots and 
the marginal distributions that these dis- 
tributions of Ss’ medians also correspond 
rather well to the distributions in Figures 
12-15 based on pooled responses. Along the 
latency dimension, Ss instructed to blink 
tended to fall in the region between 200 and 
400 msec. with a mode near the middle of 
the range, whereas Ss run under instructions 
not to blink fell mostly in the region be- 
tween 350 and 500 msec. Along the slope di- 
mension, Ss run under instructions to blink 
were distributed in the region of steep 
slopes whereas Ss run under instructions 
not to blink were located generally in the 
region of rather shallow slopes. 
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of median slopes and median latencies for indi- 


vidual Conditioning Ss. Data were obtained over Trials 1-60 and include only responses with latencies 


greater than 150 msec. 


In summary, it seems clear that in large 
measure the shapes of the pooled response 
distributions discussed in previous work 
and in earlier sections of this report reflect 
the way in which the typical responses of 
individual Ss are distributed, not simply 
the processes occuring in each S. 

Error probabilities in classifying Ss. At- 
tention to the data of individual Ss suggests 
a redefinition of the task of identifying V 
and C Ss. Thus far the procedure for dis- 
carding Ss has started with first classifying 


responses, then discarding Ss. The valida- 
tion of the discarding of Ss consisted in 
evaluating how successfully it eliminated 
responses of certain descriptions from the 
data. Apart from possible complications 
arising from the fact that the classification 
of responses rests on pooled response distri- 
butions, we have seen above that the con- 
ventional procedure simply does not work 
well. 

Perhaps a more direct approach is possi- 
ble, one in which Ss would directly be clas- 
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Fig. 26. Scatter plots and marginal distributions of median slopes and median latencies for individual 
Ss in the Instructed-inhibit and Instructed-blink (no UCS) conditions. Data were obtained over 
Trials 1-60 and include only responses with latencies greater than 150 msec. 


sified as voluntary or not. With such a 
procedure, the experimental conditions in- 
volving blink and inhibit instructions in this 
paper would be regarded as sources of sub- 
ject types rather than response types. The 
logic formerly applied to error probabilities 
in classifying responses would here be ap- 
plied to the analogous error probabilities in 
classifying Ss. 

The « and 8 error probabilities in at- 
tempting to classify Ss directly are shown 


in Figures 27 and 28. As before, it is as- 
sumed that instructions to blink provide V 
Ss, and instructions not to blink provide C 
Ss. The ordinates of the descending funt- 
tions in Figure 27 show values of a, the 
probability of including a V S. The ordi- 
nates of the ascending functions are values 
of B, the probability of discarding a € 8. 
We must keep in mind in the following dis- 
cussion that these distributions of Ss’ me- 
dians are less stable than the analogous 
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plots, diseussed earlier, of distributions 
based on pooled responses. In spite of the 
probable limitations imposed by the small 
samples involved here, it is possible to show 
how one might proceed to develop cutting 
points for classifying Ss. The logie is exactly 
like that previously employed. Because the 
cost of erroneously including a V S is much 
greater, it is supposed, than the cost of dis- 
carding a C S, we set « at the small arbi- 
trary value of .05. The latency cutting point 
which corresponds to a = .05 is roughly 380 
msec. Thus if we discard an S whenever his 
median latency is less than 380 msec., we 
will in the long run include only about 5% 
of the true V Ss in our sample. Depending 
on how we use Figure 27 to estimate 8, we 
should expect to discard between 10% and 
30% of the true C Ss. 

Figure 28 represents the data for deter- 
mining a slope cutting point. Here a is rep- 
resented by the ordinates of the ascending 
functions, and 8 by the ordinates of the 
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Fic. 28. Cumulative relative frequencies of 
median response slopes for individual Ss in the 
Instructed-inhibit and Instructed-blink conditions. 


descending functions. The slope value which 
results from setting « at .05 is roughly 30%. 
The corresponding value of 8 is somewhere 
between .15 and .24. To state again the im- 
plications of this analysis: using a slope 
cutting point of 30% should result in in- 
cluding about 5% of the true V Ss and dis- 
carding 15% to 24% of the true C Ss. | 

It is interesting to note that when‘ both 
the latency and slope cutting points are ap- 
plied simultaneously in a disjunctive, fash- 
ion, none of the Instrueted-blink Ss are 
classified as C Ss and 46% of the Instructed- 
inhibit Ss are classified as V Ss. That is, 
if an S is discarded whenever either his 
median latency is less than 380 msec. or his 
median slope is greater than 30%, very 
nearly 100% of the true V Ss will be dis- 
carded, whereas 54% of the true C Ss are 
retained, (Some V Ss undoubtedly would be 
retained, in spite of the fact that all of the 
present Instructed-blink Ss would be dis- 
carded.) 
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. The breakdown of the total number of 
Ss in the two Conditioning groups in the 
present study into the four categories 
formed by slope and latency cutting points 
is shown in Figure 29. The results of the two 
experiments are in good agreement. How- 
ever, we again encounter a discrepancy be- 
tween lateney and slope procedures. The 
two categories containing Ss which were not 
classified alike by the two cutting points 
contain about 25% of all Ss. In the light of 
the results discussed in the last paragraph 
showing the efficacy of a disjunctive cri- 
terion, the discrepancy between slope and 
latency results illustrated in Figure 29 
would probably best be circumvented with 
the disjunctive criterion. With this criterion 
about 46% of the Ss would be discarded, 
which admittedly is not a happy outcome. 
Nonetheless, if the assumptions are valid 
under which the cutting points were de- 
veloped, the remaining 54% of the Ss should 
consist almost exclusively of true C Ss. 
Learning curves for V and C Ss. As a final 
order of business, Figure 30 presents the 
learning curves of the Ss who did and did 
not meet the disjunctive criterion above. 
The discarded Ss responded more frequently 
throughout training than the retained Ss. 
These data are similar to those cited by 
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failed to meet both of these criteria (discarded Ss). 


Spence and Ross (1959) and by Goodrich, 
Markowitz, and Wall (1963). It is clear 
that the estimated probability of an antici- 
pàtory response is smaller after the dis- 
carding of suspected V Ss. 


Discussion 


The major functions of the present re- 
port are two in number: first, to display in 
greater detail than heretofore the data of 
eyeblink conditioning experiments as these 
data have been analyzed for V responses, 
and second, to repeat and extend the ap- 
proach originated by Spence and his asso- 
ciates to attempt an evaluation of the cur- 
rent status of this methodology. The initial 
examination of the conditioning data sug- 
gested that even samples as large as those 
employed here are insufficient to insure 
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that pooled distributions will have the same 
shapes upon replication, This is but another 
instance of the extremely large variability 
often characteristic of eyeblink conditioning 
data (cf. Spence, 1964, pp. 131, 133). In 
addition, the present results show clearly 
that the discrepancy between latency and 
slope analyses of V responses is not found 
only when a ready signal is omitted; the two 
modes of analysis were not equivalent meth- 
ods of discarding V Ss when a ready signal 
was employed. 

One outcome of the present extension of 
the approach of Spence and Taylor (1951) 
and Spence and Ross (1959) was the de- 
velopment of new cutting points for identi- 
fying V responses, cutting points which 
were explicitly rationalized in terms of the 
probabilities of errors of classification, The 
deductive consequences which arose from 
the assumptions used in developing these 
cutting points cannot be regarded as either 
supporting or casting doubt on the assump- 
tions. It remains to be seen whether future 
research will be able to make use of the 
suggested methods. Unfortunately, use of 
the cutting points for identifying V re- 
sponses to eliminate V Ss in the manner of 
Spence and Ross (1959) was quite unsuc- 
cessful and casts doubt on this particular 
approach to discarding Ss. 

The most interesting result of extending 
the basic approach of earlier investigations 
was the direct identification of V Ss by their 
resemblance to Ss instructed to blink. The 
principal recommendation arising from this 
procedure and the results presented here is 
that an S be eliminated from an eyeblink 
conditioning experiment if his median re- 
sponse latency is less than 380 msec. or his 
median relative response slope exceeds 307%. 
The main rationale for this discard rule is 
the implication of the present experiments 
that nearly all of the V Ss would be elimi- 
neod by application of this disjunctive 
rule. 

Unfortunately this rationale is not fully 
adequate. First, the distributions of Ss' me- 
dians upon which the cutting points were 
based contained too few cases to permit 
uncritical application of the exact values 
reported here. Clearly, validation research 


is required. Second, it must be remembered 
that a particular set of experimental param- 
eters was employed in the present experi- 
ments. Certain of the numerical results may 
not be invariant with changes in these pa- 
rameters. One may entertain some cautious 
optimism, however, in view of the finding 
that latency and slope distributions for Ss 
instrueted to blink were invariant with 
presence-absence of the UCS (Experiment 
1, Figures 12 and 13). Third, we cannot be 
altogether happy with the assumptions con- 
cerning the relation between instructions 
to blink or not blink and the occurrence of 
V and C responses. Few would argue that 
the description of Ss instructed to blink as 
“voluntary” is inappropriate. But we can- 
not yet be sure that “voluntary” means the 
same thing here as in the conditioning situ- 
ation. Does, for example, a V S whose re- 
sponses have as their controlling outcome 
the successful compliance with instructions 
to blink have the same median slope and 
latency as an S whose responses have as 
their controlling outcome the mitigation of 
the unpleasantness of a puff in the eye? We 
have assumed the answer is “yes,” but the 
question has not been put to an adequate 
test. 

More open to question is the assumption 
that responses which occur under instruc- 
tions not to blink are conditioned or non- 
voluntary. It does appear reasonable that 
such responses are not voluntary blinks. 
But are they pure conditioned responses? 
The possibility exists that such responses 
are a complicated resultant of a true condi- 
tioned closure with a voluntary opening. It 
is interesting to note that the typical re- 
sponse under inhibit conditions, like the 
typical conditioned response described by 
Spence and Ross (1959, p. 378), was a 
gradual and irregular movement resembling 
a tense, slow, and controlled limb move- 
ment. The existence of an eye-opening re- 
sponse cannot be doubted. Record. C in 
Figure 1 illustrates a trial on which an 
opening occurred without a closure. Such 
responses were not uncommon under the 
inhibit condition and are occasionally seen 
in Ss run under ordinary conditioning 
instructions. If the form of the typical con- 
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ditioned response is thought to be the re- 
sultant of opposing blink and opening re- 
sponses, it is apparent that we may not 
have isolated the conditioned blink at all. 

Fortunately, the rationale for the proce- 
dures recommended in this report depends 
far less critically on interpreting the results 
of instruetions not to blink than it does on 
interpreting the results of instructions to 
blink. This difference in importance arises 
because of the presumption that an error 
of falsely including a V S is far more seri- 
ous than the error of falsely discarding a C 
S. Thus the only matter which actually de- 
pends upon interpreting the distributions 
from Ss instructed not to blink is the esti- 
mation of the proportion of true C Ss which 
are discarded. 

One additional problem may be raised. 
An S under conditioning instructions may 
adopt, or even abandon, the voluntary mode 
of responding any time during a series of 
trials. The available procedures for dis- 
carding V Ss, including those developed in 
the present paper, make use of the median 
of S's latencies or slopes over the entire 
training session. It is clear that if S became 
a ‘voluntary responder during the second 
half of the session, he probably would not be 
discarded and his data for the latter part 
of acquisition would be combined with data 
from C Ss. A check was made for the two 
sets of conditioning data in the present 
study of whether different Ss were identi- 
fied as V Ss when all 60 trials were included 
as compared with when only the last 30 
trials were included. A few Ss changed 
classification, but because these generally 
were Ss who made few responses and be- 
cause the number of cases on which the 
medians were based was not generally the 
same for Trials 1-30 as for Trials 31—60, a 
conclusion could not be reached as to 
whether Ss had actually adopted the volun- 
tary mode of responding late in training. 
The problem may be more serious with 
weaker UCS intensities than with the strong 
5-psi puff employed here because a weak 
UCS may become increasingly aversive 
with successive trials. A very strong UCS, 
in contrast, may induce avoidance from the 
outset. 


Clearly the present work has not estab- 
lished that rules for identifying voluntary 
processes in eyeblink conditioning are nec- 


.essary. Gormezano (e.g., 1965, pp. 63-67) 


has questioned the advisability of applying 
such rules. Yet the possibility cannot easily 
be dismissed that voluntary processes play 
a contaminating role? The present study, 
like others before it, achieved some limited 
success in setting up rules for ridding data 
of the influence of V responses. But as we 
have seen, these rules stand upon assump- 
tions which remain questionable and open 
to examination in future research. It 
certainly cannot be argued that further 
research efforts and refinements of the ap- 
proach represented here will not bring im- 
provement in the situation. Yet the thought 
remains that we will not be able to decide 
with confidence in the near future whether 
eyeblink conditioning can profitably be 
studied as a clear example of classical con- 
ditioning as understood here. 

This discussion, of course, pertains to 
eyeblink conditioning experiments in which 
the UCS is an air puff; it is with such a 
UCS that the avoidance possibility arises.1° 
Possibly the problems raised concerning 
voluntary processes may be circumvented 
by employing as a UCS an electric shock or 
other aversive stimulus to elicit the UCR. 
Several of the early studies of eyeblink con- 
ditioning did use shock (e.g., Cason, 1922). 
Some fairly unsystematic exploratory work 
in the writer’s laboratory has shown rather 
large amounts of “adaptation” of the UCR 
to both shock and auditory stimuli as un- 
conditioned stimuli. It is quite possible, 
however, that effective techniques can be 
found. We might expect that to the extent 
such techniques eliminate the possibility of 


*Prokasy (1965) has recently addressed himself 
to some of the instrumental-like features of eye- 
blink conditioning. 

? Since the present study was carried out, Spence 
and his associates (e.g, Spence, Homzie, & Rut- 
ledge, 1964) have reported on a “masking proce- 
dure" designed to preclude S's recognizing that he 
is in an eyeblink conditioning experiment. This 
procedure has thus far been used mainly to 
eliminate “inhibitory sets" at the outset of extinc- 
tion, but it presumably has some bearing on the 
presence of voluntary responding during acquisi- 
tion as well. 
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avoidance, fewer Ss would be discarded with 
the criteria developed in the present report. 

It was suggested earlier that among all 
conditioning situations employed, eyeblink 
conditioning, as usually carried out, is one 
of the most likely to involve instrumental 
processes. Other workers have found other 
grounds for doubting the value of studying 
eyeblink conditioning. In a volume directed 
largely to the analysis of the role of classi- 
eal conditioning in behavior, Mowrer (1960) 
devoted but one page to “short-latency” 
reaetions such as the eyeblink. According 
to Mowrer, the really clear examples of 
classical conditioning involve “emotional” 
reactions. The short-latency reactions, on 
the other hand, are “of comparatively little 
biologieal importance and do not, appar- 
ently, show learning in its most typical 
important form [Mowrer, 1960, p. 386]." 
Razran seemed to express much the same 
view, listing the eyeblink among condi- 
tioned responses which are “organismically 
inconsequential” and “not within the center 
of the organism’s needs and action [Razran, 
1961, p. 99]." 

We need not agree at once with these as- 
sessments of eyeblink conditioning. It will 
suffice to devote some thought and research 
effort to the issues raised by these views 
and by the analyses presented in the present 
report. If these considerations do not dic- 
tate abandoning eyeblink conditioning, cer- 
tainly they do not suggest complacency. 
Whatever the alternatives, the suggestion is 
strong that when we conduct eyeblink con- 
ditioning experiments we do not, in an im- 
portant sense, know what we are doing. 


SUMMARY 


Presumptive evidence exists for the pres- 
ence of avoidance responding in some eye- 
blink conditioning experiments. Previous 
research led to procedures for discarding 
voluntary Ss from such experiments if these 
Ss exhibited large numbers of V responses 


defined by criteria based on response la- 
tency or response slope. The latency cri- 
terion was used in experiments which em- 
ployed a ready signal; the slope criterion 
was used in experiments which did not em- 
ploy a ready signal. 

The first part of the present study was 
designed to determine the relation between 
response slope and latency in experiments 
using a ready signal and to determine 
whether the conventional slope criterion is 
equivalent to the conventional latency cri- 
terion in such experiments. The results 
showed that latency and slope criteria are 
not equivalent bases for identifying V re- 
sponses in experiments with a ready signal. 

The second part of the study was con- 
cerned with developing new criteria for 
identifying V responses. Essentially, the 
procedure involved isolating true V and C 
responses by instructing different Ss to 
blink or not blink to the C8. Analyses of the 
slope and latency data of these Ss showed 
that the conventional criteria were highly 
questionable because they placed relatively 
too little weight on the error of including a 
true V response. New criteria were specified 
so as to hold at .05 the estimated probabil- 
ity of such an error, Unfortunately, the 
usual procedure of discarding Ss became 
highly ineffective with the new definitions 
of V responses. The most defensible proce- 
dure hit upon was based on a direct analy- 
sis of the median latencies and slopes of the 
Ss instructed to blink or not to blink. 

The eyeblink situation is far from the 
simple instance of classical conditioning 
which at first it appears to be. The value 
of available methods for removing the in- 
fluence of voluntary processes from data 
obtained in this situation remains equivo- 
eal. The complications displayed in the 
present paper must be considered in evalu- 
ating the eyeblink experiment as a tech- 
nique for the study of classical condition- 


ing. 
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The ringex is a model of the cognitive organization of interpersonal be- 
havior in reciprocal roles. A simple scheme is presented which defines 64 
interpersonal variables in terms of 6 dichotomous facets. The relationship 
among the variables is predicted from their sequence of differentiation in 
the child. It is also suggested that, as the individual matures, the initial 
developmental pattern of interrelationship is modified by the influence of 
the behavior of the other person. These hypotheses appear to be supported 
by data referring to the reciprocal roles of husband and wife. In the discus- 
sion of the finding, an attempt is made to integrate developmental, 
cognitive, cross-cultural and abnormal aspects of interpersonal behavior. 


The role structure of the family has been 
described in another paper (Foa, Triandis, 
& Katz, 1966), which supplied a broad pic- 
ture of the relationship between these roles, 
but gave no details about the structure of 
each of them. Here only two of the family 
roles are considered, husband to wife and 
wife to husband. A closer, more detailed 
view of their inner structure and of their 
interrelationship is, however, presented, 
which may serve as a model for other re- 
ciprocal roles as well. Thus, while the em- 
pirical data refer to the husband and wife 
roles, the theoretical treatment is concerned 
with reciprocal roles in general. 

The picture any of us has of his relation- 
ship to another person, in reciprocal roles, 
looks amazingly complex: it contains the 
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behavior of the observer himself and the 
behavior of the other; the actual behavior 
and the corresponding norm; the percep- 
tion and norms from the observer’s point of 
view; and the perception and norms the 
observer ascribes to the other. Each of these 
perceptions refers to a given type of be- 
havior. There are several types of behavior 
to be considered: behavior toward the self 
and behavior toward the other; behavior 
concerned with affect and behavior con- 
cerned with status; behavior which takes 
away and behavior which gives. 

It is unlikely that the observer records 
these perceptions of different types of be- 
havior in a haphazard manner. If this were 
the case, it would be difficult for him to 
compare his behavior towards the other 
with the other’s behavior towards him, 
actual as opposed to ideal behavior; and 
so on, This type of reference is in constant 
use in daily life. It is probable that these 
different perceptions appear in a certain 
order and form an organized picture of our 
observer's relationship with the other. This 
study is an attempt to describe this organi- 
zation. 


DEFINITION OF THE VARIABLES 


The pieture we are going to describe is 
not the only possible one, nor is it the most 
comprehensive. Other variables could have 
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been chosen and more added to those de- 
fined. The selection of our variables, ana- 
lyzed in this study, can be justified on three 
counts: 

a. They are based on notions which are 
well established in sociopsychological litera- 
ture; 

b. They are generated in a systematic 
manner rather than chosen arbitrarily; 

c. They lead to the prediction of empiri- 
cal results. 

Each of the variables discussed here 
defines a certain perception of a certain be- 
havior. Indeed, each type of behavior is 
perceived in several different ways and each 
type of perception may refer to different be- 
haviors. The variables can therefore be 
cross-classified according to the type of 
perception and the type of behavior. It 
follows that in order to define the variables 
it will be sufficient to define types of per- 
ception and types of behavior. The com- 
bination of any of these two types will 
produce a variable. 


The Perceptual Types 


Consider any interpersonal behavior, 
like, for example, friendliness to the other. 
One does differentiate between his friend- 
liness to the other and the friendliness of 
the other to him; between actual friend- 
liness and the corresponding norm; between 
friendliness as perceived from his point of 
view, or according to his norm; and friend- 
liness according to the point of view and 
norm ascribed to the other. 

More formally, the perception of a given 
behavior is differentiated according to the 
following three perceptual facets: 

A. The person doing the action, or actor, 
with two elements: a,, the other (non- 
observer), and a; , the self (observer) ; 

B. The level: b, , actual (what is done), 
and bz, ideal (what ought to be done) ; 

C. The person from the point of view of 
whom the action of a given actor is per- 
ceived, or alias: e; , the other (nonactor) , 
and es, the self (actor). 

Thus, a given perception of our observer 
can be classified according to the actor, the 
level, and the alias. 

The profiles of the elements of the facets 
taking one element from each facet define 


eight perceptual types. For example: a,b,c, 
is the perception of the actual behavior of 
the other from the point of view of the 
other; azbec; is the perception of the ideal 
behavior of self from the point of view of 
the other, and so on. 

To call “what ought to be done” a per- 
ception is, perhaps, to stretch the meaning 
of this word. It would have been more ap- 
propriate to use a term like norm, value, 
ideal. We have used the word perception in 
this sense for lack of a better term covering 
both actual and ideal behavior. 


The Behavioral Types 


The types of behavior are defined, in a 
manner similar to the perceptual types, by 
the following three facets: 

D. Content of behavior: d; , acceptance 
or giving, and d» , rejection or taking away. 

E. Object of behavior: e; , the other (non- 
actor), and es , the self (actor). 

F. Mode of behavior: fı , social or status, 
and f; , emotion or love. 

Taking profiles over the elements of 
these facets eight behavioral types are de- 
fined, as for example, diesf or emotional 
acceptance of self; dseif, or social rejection 
of other. 

This classification of behavior suggests 
the giving or taking away of love and status 
from the self and the other as basic fea- 
tures of interpersonal interaction. Other 
conceptions of this behavior are of course 
possible and have been proposed. The rea- 
sons for choosing this particular formula- 
tion, on the basis of theoretical considera- 
tions and empirical findings, have been 
discussed at length in an earlier paper (Foa, 
1961). Additional findings supporting this 
formulation have been reported by Adams 
(1964). 


Combining the Types 


By combining any perceptual type with 
any behavioral type we can define a vari- 
able. For example: a combination of percep- 
tual type asbico, the perception of the 
actual behavior of the self from the point of 
view of the self; with behavioral type 
dye:f2, emotional acceptance of the other, 
produces asb;esd;e;fo, which is the degree 
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(intensity, frequency) of our observer's 
perception of his actual emotional accept- 
ance of the other. The notion of degree or 
intensity or frequency, although not ex- 
plicit in the formal notation, has been added 
to indicate the range of values that a vari- 
able can assume. In the same manner other 
variables can be defined, just by combining 
a perceptual type with a behavioral one. 
Since there are eight of each type, 8 x 8 
= 64 variables have been defined (see 
Table 1). 

Each row of the table defines a per- 
ceptual type, indicated by a Roman nu- 
meral from I to VIII. Each column of the 
table defines a behavioral type, numbered 
from 1 to 8. Each cell of the table, at the 
crossing of a given row with a given column, 
defines a variable made up of the percep- 
tual type of this row and of the behavioral 
type of this column. To indicate a variable 
is, therefore, sufficient to specify the nu- 
merals of its perceptual and behavioral 
types. For example, Row I, Column 1 de- 
fines the Variable I, 1 “To what degree does 
the observer perceive, from his point of 
view, that the other (nonobserver) accepts 
him socially," or “To what extent the ob- 
server perceives he is receiving status from 
the other.” 

Some other examples are: Variable V, 2 
“To what degree, according to the observer, 


does the other think that he (the observer) 
ought to accept the other emotionally.” 

Variable VII, 5: “To what degree does 
the observer feel that he is rejecting himself 
socially (taking away status from him- 
self).” 

Variable VI, 7: "To what degree does the 
observer feel that he ought to reject the 
other emotionally (to deny him love).” 

The upper half of the table refers to the 
role of the other toward the observer, the 
lower half to the role of the observer toward 
the other. 

This classification of variables can easily 
be interpreted in terms of social exchange. 
Variables in Rows I-IV, Columns 1-2 and 
7-8, refer to what the nonobserver gives to 
the observer, that is, to what the observer 
receives from the other. Variables in Rows 
V-VIII, same columns, concern what the 
observer gives to the other. The remaining 
variables refer to what each actor gives to 
himself. Two currencies are used, love and 
status. What is and what ought to be given 
and denied is recorded from the point of 
view of the observer as well as from the 
point of view of the other. 

In the eight variables in any given 
column the behavioral type is constant, only 
the perceptual type changes. In the vari- 
ables of the same row, on the other hand, 
the perception is constant, while only the 


TABLE 1 
Facet DEFINITION OF THE VARIABLES OF AN OBSERVER 
Perceptual Type Behavioral Type 
Content Acceptance Rejection 
Object Other Self Self Other 
- hs a i] ij 
3 3 $3 $ d i z 
à 4.341 à 17 
Type iy à 3 4 rov 7 8 
Nonobserver or other Actual Nonactor I 
Actor II 
Ideal Actor III 
Nonactor IV 
Observer or self Ideal Nonactor V 
Actor VI 
Actual Actor VII 
Nonactor Vill 
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behavior changes. Thus, for example, the 
variables of Row VII refer to the degree to 
which each one of the eight behavioral 
types appears in the observer’s perception 
of his own actual behavior from his point 
of view. 

We can, therefore, look at the 64 vari- 
ables column by column keeping the be- 
havioral type constant or row by row hold- 
ing constant the perceptual type. This 
manner of classification will prove useful 
in the formulation of hypotheses about the 
interrelationship among variables. 


(Tur HYPOTHESES 


Each one of the 64 variables may have a 
certain frequency or intensity, varying from 
very little to very much, or from rarely to 
often. The hypotheses are concerned with 
the relationship among the frequencies of 
the variables. It is proposed that the eight 
variables belonging to any given row or 
column of Table 1 will be related in the 
same order in which they appear in the 
table; that is, the nearer any two variables 
are in the table the higher will be their 
correlation. It is further suggested that the 
first and last variable of a given row or 
column will also be fairly close to each 
other. So the intercorrelation matrix of each 
set of eight variables will approximate the 
circumplex pattern (Guttman, 1954). To 
explain how the hypothesis was derived 
some comments on the sequence of develop- 
ment of notions of interpersonal perception 
in children appear appropriate. 

Let us start with a very simple considera- 
tion. When the adult observer of our study 
interacts with another person he is ap- 
parently able to classify the ongoing be- 
havior in the scheme that we have pre- 
sented. He is able to differentiate between 
what he does and what the other fellow 
does, between what is being done and what 
ought to be done, between acceptance and 
rejection, and so on. On the other hand, 
when a newborn child becomes an observer 
he can make none of these differentiations. 
What is then the sequence of development 
which bridges this enormous gap? Which 
concepts develop first and which next? We 
shall make some proposals regarding the 


sequence of development of the perceptual 
and behavioral facets which have been 
defined. 

It is proposed that, in the perceptual 
facets, the differentiation between actors 
develops first, followed by the differentia- 
tion between levels and then between 
aliases. The differentiation between self and 
nonself, as actors, seems to occur, in the 
pre-oedipal phase, as soon as the child 
realizes that his behavior and the behavior 
of his mother are not one and the same 
thing. In fact this differentiation provides 
the child with his first two roles: his role 
toward his mother and the role of the 
mother toward him. The differentiation be- 
tween actual and ideal level could not be 
easily made before the actors differentia- 
tion: ideal behavior, in its elementary form, 
is the behavior of the child which is fol- 
lowed by acceptance behavior of the adult; 
thus it seems to require differentiation be- 
tween the two actors, the child and the 
adult. The realization that “what mother 
wants me to do” may be different from 
“what I want to do,” provides the beginning 
of the third differentiation, between the 
point of view of the actor and the point of 
view of the other. Therefore, if differentia- 
tion by level requires prior differentiation 
by actor and contains the beginning of 
differentiation by alias, the suggested se- 
quence of development will be sustained: 
Nonself is differentiated from self, as actor, 
then ideal behavior from actual behavior 
and, finally, the point of view of the other 
from the point of view of the self. This sug- 
gests that initially the child perceives that 
he is the actor of everything being done. 
Self, as actor and alias, and actual are the 
primary elements of the perceptual facets. 
These elements, however, become meaning- 
ful only after their differentiation from the 
other secondary elements. As Piaget (1955, 
p. 237) has noted “Precisely because he 
feels omnipotent, the child cannot yet con- 
trast his own self with the external world.’ 

Among the behavioral facets the pro- 
posed sequence is: first, content; then, ob- 
ject; and finally, mode. There are a num- 
ber of considerations suggesting that the 
differentiation between accepting and re- 
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jecting may be made very early in child 
life. It may well be the very first inter- 
personal differentiation made by the child. 
Omologous physiological mechanisms, like 
inhaling and exhaling, suckling and excret- 
ing, are present from birth. In early psy- 
chomotoric development the child first 
learns to grasp, then to reject things, thus 
establishing a distinction between these two 
types of action. The same sequence occurs 
in the first days of life of imprinting ani- 
mals. Furthermore, it seems that some ele- 
mentary code for classifying and recording 
behavior, like accepting and rejecting, is 
necessary before the notions of object and 
actor become meaningful. The notion of 
what is done seems likely to precede the 
notion of to whom is done (object) and by 
whom (actor). On the other hand the differ- 
entiation of status from affect requires the 
perception of a social group larger than the 
mother-son dyad or the mother-father-son 
triad, in order to occur. Even if the child 
realizes that the parents have higher status 
than himself, this status difference will be 
overlapping with the self-other differentia- 
tion. It is only when the siblings enter the 
social scene of the child that the notion of 
status may have a chance to become es- 
tablished: there are now others who have 
higher status than the child (parents) or a 
similar one (siblings), These considerations 
suggest that differentiation of rejection from 
acceptance will occur first and differentia- 
tion of status from affect last, with the 
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differentiation of other from self, as objects, 
occurring at some intermediate time. 

These three stages of this proposed 
process of concept building by binary fis- 
sion are schematically represented in Figure 
1 for the perceptual facets and in Figure 2 
for the behavioral facets. 

In Figure 1 the inner circle indicates the 
differentiation by actor (observer and non- 
observer). Moving away from the center, 
the next circle shows how the differentiation 
by actor subdivides again according to the 
level: actual and ideal. The third circle in- 
dicates the further subdivision by alias. 
The outer circle shows the eight perceptual 
types numbered from I to VIII resulting 
from the three stage process of successive 
subdivision of concepts. Each successive 
differentiation is obtained by a subdivision 
of the previous concept according to the 
new facet. 

An identical process is shown in Figure 2 
with regard to the behavioral facets. The 
inner circle indicates the differentiation by 
content between acceptance and rejection. 
The next one between objects and the third 
one between modes. The outer circle shows 
the eight behavioral types, numbered from 
1 to 8, resulting from this process. 

The concept of observer is not included 
in this developmental scheme. A child learns 
to differentiate between actors, aliases, ob- 
jects, modes, ete. To do this he has first to 
become an observer, but no differentiation 
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between observers occurs at any stage of 
development. What in common parlance is 
often described as differentiation between 
observers is in fact differentiation between 
points of view of the same observer. 

The perceptual types of Figure 1 and the 
behavior types of Figure 2 follow the same 
order in which they are given in Table 1. 
The figures, show, however, that the pro- 
posed order of types seems to result from the 
process of successive differentiation of con- 
cepts occurring during childhood. Having 
thus explained how the hypothesis was 
derived, we can now turn to the procedure 
for gathering the data and to the presenta- 
tion of the findings. 


PROCEDURE 


The empirical evidence to be presented is pro- 
vided by a study of a sample of 633 married 
couples in Jerusalem, Israel. Husband and wife 
were interviewed separately and simultaneously in 
their home by two field workers. 

Each one of the 64 variables defined in Table 
1 was observed by means of a three-question 
Guttman scale. The scale score indicates the 
degree of perceived occurrence of the variable, as 
reported by the observer. A score was therefore 
obtained for each respondent on each one of the 
variables. In fact, the questionnaire was built 
according to the facet design of the 64 variables 
as given in Table 1. Let us explain how this was 
done: For each one of the eight types of behavior 
and for each actor, three brief stories were pre- 
pared. For example, the three stories referring to 
the husband's social acceptance of the wife run 
as follows: 

Abraham has consideration for his wife and 
displays toward her respect and esteem. 

Isaac thinks his wife is very successful and 
especially esteems her personality and her actions. 

Jacob is sure that everything his wife does is 
important and good and there is no limit to the 
esteem and importance that he attributes to her. 

Some other examples of stories are: 

Social acceptance of self: Abraham is a husband 
who esteems himself and relies on himself and on 
his decisions. 

Emotional acceptance of self: Isaac is a hus- 
band who is satisfied with his actions and feels 
very much at peace with himself. 

Social rejection of wife: Abraham slightly crit- 
ieizes his wife's behavior and thinks that she 
makes a few mistakes. 

Emotional rejection of self: Jacob is a husband 
very dissatisfied with himself and with his be- 
for toward his wife, rejects and blames him- 
self. 


Similar stories were used for the behavior of the 
wife. 

After each story four questions were asked, to 
differentiate between perceptual types, as follows: 

Actual level, alias of the actor 

Do you behave toward your wife as does the 
husband in the story? (Almost always; generally; 
sometimes; seldom; almost never.) 

Ideal level, alias of the actor 

Do you think that a husband should behave 
as does the husband in the story in relation to 
his wife? 

Actual level, alias of the nonactor 

Would your wife say that you resemble the 
husband in the story? 

Ideal level, alias of the nonactor 

Would your wife say that a husband should be- 
have thusly? 

In this manner, the facet definition of the 
variables (see Table 1) provided the basis for 
constructing the questionnaire: the behavioral 
type and the actor were specified by the story, 
and the remaining two facets of the perceptual 
type, level and alias, by the question. The same 
question was asked three times: once after every 
one of the three stories for a given type of 
behavior and actor. Thus three observations were 
obtained for each variable. They were found to be 
scalable, and a scale score was computed follow- 
ing the usual procedure for Guttman scaling. The 
scale score of each variable was then correlated 
with the score of other variables. 


Tue EMPIRICAL STRUCTURE 


The intercorrelations between the eight 
variables belonging to the same row or 
column of Table 1 can be arranged in a 8 
X 8 matrix. Since there are, in Table 1, 
eight rows (and eight columns), it follows 
that eight intercorrelation matrices are ob- 
tained for each observer when the variables 
are taken either by row or by column. The 
study includes two observers, husband and 
wife. Thus, there are, in total, 16 matrices 
to be considered for the rows and an equal 
number for the columns. 

It has been predicted that the intercor- 
relations between the eight variables, in the 
same row or in the same column of Table 1, 
will tend to follow the circumplex pattern. 
In a circumplex the higher correlations are 
found near the main diagonal; moving away 
from the diagonal cell the coefficients de- 
crease and then increase again. As already 
noted, the cireumplex suggests a circular 
order of the variables, so that the first and 
last variables are also closely related (Gutt- 
man, 1954). When the order is open, so that 
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TABLE 2 

AN EXAMPLE or BEHAVIORAL OrRcUMPLEX; CONSTANT PEROEPTUAL Tyre VII; Opserver: WIFE 
Type 1 2 3 4 5 6 7 8 
1 = 65 27 24 06 20 35 45 
2 65 3 28 16 09 20 36 40 
3 27 28 ae 52 28 17 01 07 
4 24 16 52 = 31 17 —11 —01 

5 06 09 28 31 — 39 18 24 
6 20 20 17 17 39 Tm 94 35 
7 35 36 01 =A 18 34 — 53 
8 45 40 07 —01 24 35 53 5 


Note.—Decimal point omitted in all tables. 


the first and last variables correlate least, 
the pattern is called simplex (Guttman, 
1954). Common to these two intercorrela- 
tion patterns is the notion of order among 
the variables (Guttman, 1958). 


The Order of Behavior Types 


Let us consider the intercorrelations 
among variables belonging to the same row 
of Table 1. In each one of these sets of eight 
variables the perceptual type and the ob- 
server are constant, while the behavioral 
type changes. Thus, each matrix shows the 
relationship among the eight types of be- 
havior for a given perceptual type, that is, 
for a given actor, level and alias of one of 
the two observers. 

The example of Table 2 refers to per- 
ceptual Type VII of the observer wife. The 
full set of 16 matrices has been published 
earlier (Foa, 1962) and is also reported in 
the appendix.’ To give the variable scores 
a single meaning, from unfavorable (low ac- 
ceptance or high rejection) to favorable 
(high acceptance or low rejection) to the 
interpersonal relation, the scores of the re- 
Jection variables were reversed. This ex- 
plains the positive correlation coefficient 
between acceptance and rejection: it just 
means that the more frequent the accept- 
ance the less frequent the rejection. 


“See the technical appendix to this paper which 
has been deposited with the American Documenta- 
tion Institute. Order Document No. 9026 from ADI 
Auxiliary Publications Project, Photoduplication 
Service, Library of Congress, Washington, D. C. 
20540. Remit in advance $2.50 for microfilm or 
$6.25 for photocopies, and make checks payable to: 
ien Photoduplieation Service, Library of Con- 

'ess, 


The predicted order of behavioral types is 
rather well supported by the correlations 
of Table 2: the coefficients are high near 
the main diagonal and then decrease and 
increase again. In the first row of the table, 
for example, the coefficients decrease as one 
moves from the first column to the right, 
reach the lowest point in Column 5, then 
increase again gradually. In this matrix 
there are four deviations from the predicted 
pattern: the coefficients between Variables 
1-7, 2-4, 5-7, and 6-7 are lower than ex- 
pected. Some of the other 15 matrices are 
even better than this one, having as few as 
one or two deviations. The largest number 
of deviations is nine (in two cases), and the 
median number of deviations for all the 16 
matrices is six. Some of these deviations 
tend, however, to be systematic: correlation 
between Type 2 and Type 4, for example, : 
is always lower than expected. The correla- 
tion between Type 6 and Type 7 also is 
often too low. 

It will be noted that contiguous variables 
are not equally spaced: types of behavior 
referring to the same object (self or other) 
like 1-2 and 3-4, are usually nearer or 
more correlated than behavior types refer- 
ring to a different object, as 2-3 and 6-7. 
The correlation between social acceptance 
and social rejection of self (Types 4 and 5) 
is also often low. These features are, how- 
ever, quite compatible with the predicted 
order. 

This order suggests certain predictions 
with regard to the relationship between ac- 
ceptance and rejection, as well as between 
self and other, which may be of interest. 
Ambivalence of emotional feeling (accept- 
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ance and rejection) occurs at an early stage 
of development of the child, before the 
social mode is differentiated from the emo- 
tional one. The order of behavior types sug- 
gests that, even in the adult, more ambiv- 
alence is possible at the emotional mode 
than at the social one: one may give and 
deny affect to the same person, but status 
requires a more clear-cut decision. In the 
proposed order social acceptance and re- 
jection of other (Types 1 and 8) and of 
self (Types 4 and 5) are neighbors and 
therefore expected to be closely (and in- 
versely) related. On the other hand, emo- 
tional acceptance and rejection of other 
and self (Types 2-7 and 3-6) are farther 
apart in the order and probably less inter- 
related than the corresponding social types. 
Inspection of the 16 intercorrelation mat- 
rices shows that this prediction is supported 
in 26 out of 32 comparisons. Interestingly 
enough, all the six deviations, except one, 
occur in the matrices referring to the point 
of view of the other, that is the point of 
view which develops later in the child. 

The proposed order of behavior types 
also suggests that self and other will be 
more related at the emotional mode than at 
the social one. Types 2-3 and 6-7, concern- 
ing emotional behavior toward self and 
other, are indeed neighbors, while the cor- 
responding social types, 1-4 and 5-8, are 
not. This hypothesis, like the previous one, 
rests on a developmental rationale: the dif- 
ferentiation between self and other, like the 
differentiation between acceptance and re- 
jection, is less strong at the primary emo- 
tional mode than at the social mode, which 
appears later in the development sequence. 
Inspection of the 16 matrices shows that 
this hypothesis is supported only in half 
of the cases, the result expected under 
chance conditions. When, however, the 
matrices are classified according to the 
facets of their constant perceptual type 
some interesting differences emerge. In the 
matrices referring to actual behavior of 
self (an early type of perception), the re- 
lationship between self and other is always 
higher at the emotional than at the social 
level, as predicted. The hypothesis is not 
supported in the matrices referring to the 


later perceptions (ideal behavior and be- 
havior of other). 

The matrix of Table 2 refers to the per- 
ception of the actual behavior of self from 
the point of view of self. In terms of devel- 
opment, this is perhaps the first type of per- 
ception that occurs to a child. It is not 
without significance that in this matrix, 
which refers to the observer wife, as well 
as in the corresponding matrix for the hus- 
band, both the above hypotheses are fully 
supported. Furthermore, in the correspond- 
ing matrix for the husband, there is only 
one deviation from the circumplex pattern. 
Thus, while the predicted order of be- 
havior types seems to be supported by the 
16 matrices, the tendency to deviate from 
it may be somewhat stronger in matrices 
referring to types of perception which oc- 
cur later in the development of the child. 
A closer analysis of these deviations sug- 
gests tentatively that they may be due to a 
tendency of the facets to become less inter- 
dependent, to break away from the hier- 
archical pattern resulting from the develop- 
ment sequence. Thus the structure would 
move toward the cubex (Foa, 1965) each 
one of the three facets acquiring a dimen- 
sion of its own. If this is true, it may be 
possible that better circumplexes will be 
found in younger individuals, with increas- 
ing deviations toward the cubex model, as 
the individual becomes more mature. Ma- 
turity, in this sense, would mean that the 
criteria of behavioral differentiation become 
more independent from their sequence of 
development in childhood. 


The Order of Perceptual Types 


In the variables appearing in the same 
column of Table 1, the behavior type and 
observer are constant, while the perceptual 
type changes. Thus the matrix of inter- 
correlations of these variables shows how 
the frequency of the given behavior in one 
actor is related to the frequency of the same 
behavior in the other actor, both at the 
actual and ideal levels, and from the points 
of view of both aliases. Again there are 16 
matrices to be considered, one for each 
column of Table 1 for the observer husband 
and the same number for the observer wife 
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(See Footnote 3). One example is given in 
Table 3. It refers to behavioral Type 2, 
emotional acceptance of the other, of the 
wife. In this example, the predieted order of 
perceptual types is very well supported. 
There is only one deviation, the correlation 
between III and VI being lower than ex- 
pected. In all the other matrices the number 
of deviations is higher: as much as 10 in one 
vase and a median of 5 for all the matrices. 
In all these matrices the relationship be- 
tween actual and ideal behavior is always 
higher for the point of view of the actor 
(Types II-III and VI-VII) than for the 
point of view of the other (Types I-IV and 
V-VIII). The developmental explanation, 
suggested for the behavior types, may serve 
again here: actual-ideal differentiation is 
less sharp for the primary element of the 
alias facet, the actor, than for the element, 
other, which appears later. 

—— These perceptual matrices also show that 
the relationship between the behaviors of 
the two actors is closer at the actual level 
than at the ideal level when the object of 
the behavior is the other, that is, in the 
matrices referring to the two first and last 
columns of Table 1. When the behavior is 
directed toward the self (Columns 3 and 6 
of the same table), the contrary happens: 
the behaviors of the two actors are related 
more at the ideal level than at the actual 
one. The correlation between Types I and 
VIII (actual level) is higher than the cor- 
relation between IV and V (ideal level) 
When the object of behavior is the other, 
and lower when the object is self. This oc- 
curs in all the matrices, except the one of 
Table 3. Thus the relationship between the 


reciprocal behaviors of the two actors is 
stronger than the relationship between the 
respective norms. When, however, the be- 
havior is directed toward oneself, the norms 
of the two actors are related more than their 
actual behaviors, In the latter case, Types 
I and VIII are rather apart, so that the 
order looks more like a simplex than a cir- 
cumplex. In the former case, however, the 
distance between Types IV and V is not 
large enough to change the tendency toward 
a circular order. 

In spite of the gaps just noted, these 
perceptual matrices indicate a fairly close 
relationship between the behaviors of the 
two actors, husband and wife. Elsewhere 
(Foa, Triandis, & Katz, in press), it has 
been shown that such relatively high rela- 
tionship can be expected when the two actors 
occupy similar positions in the power struc- 
ture of the family system. This study also 
suggests that the correlations between the 
two actors are likely to be lower when there 
is more difference in power between the two 
roles, as for example, between father and 
son, and possibly more so for the behavior 
types dealing with status than for those 
dealing with affect. 

So far we have considered the relation- 
ship among the same perception of different 
types of behavior and the relationship 
among different perceptions of the same 
type of behavior. The results tend to sup- 
port the hypotheses of order among these 
types. An attempt has been made to explain 
certain features of this order as well as some 
fairly systematic deviations from it, accord- 
ing to the same rationale which served for 
generating the hypotheses: the sequence of 


TABLE 3 
AN EXAMPLE or PERCEPTUAL CIRCUMPLEX; CONSTANT BEHAVIORAL 


Type II; OBSERVER: 


Wire 


Type I n II IV v VI VII VIII 
I = 71 59 42 00 26 45 52 
I 71 = 67 55 14 30 40 51 
Ill 59 67 — 61 26 34 31 44 
IV 42 55 61 — 53 46 32 36 
y 00 14 26 53 — 47 33 33 
VI 26 30 34 46 47 — 56 47 
VII 45 40 31 32 33 56 Es 61 
VIII 52 5l 44 36 38 47 61 — 
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development of interpersonal perception in 
the child. These features of the order of 
types will have to be taken into account as 
we turn to the next problem, the relation- 
ship among variables when both the per- 
ceptual and behavioral facets are different, 
that is, variables belonging to different rows 
and columns of Table 1. Thus, for example, 
relating what we receive from the other to 
what we give to ourselves, involves vari- 
ables in different rows and columns, 


Central and Peripheral Variables 


The prediction of the relationship be- 
tween variables taken from different rows 
and columns of Table 1 seems to require 
some understanding of the manner in which 
the set of 64 variables is organized as a 
whole. For this purpose, it may be useful to 
consider the position of each variable with 
respect to the set, as indicated, for example, 
by the multiple correlation of a variable to 
the other ones. Some of these multiple cor- 
relations may be higher and some may be 
lower. Essentially this is the problem of 
communality in factor analysis (Foa, 
1963). It has been recognized that the com- 
munality of a variable is relative to the 
set of variables to which it belongs rather 
than a property of the variable itself. The 
set of variables considered in this study is 
interpersonal behavior. Thus all the varia- 
bles are interpersonal, but, to paraphrase 
George Orwell, some variables may be more 
interpersonal than other ones. If it is so, 
we may expect that the multiple correla- 
tion of the more interpersonal variables will 


be higher: they belong to the set more than 
the other, less interpersonal ones. 

To decide the degree to which a variable 
is interpersonal consider its facet elements, 
Each facet has two elements. These pairs of 
elements are: self-other; emotional-social; 
acceptance-rejection; and actual-ideal. In 
each pair one of the two elements appears 
to be more interpersonal than the other one, 
Relations with other are characteristic of 
interpersonal behavior, while relations with 
self are not. Social behavior is interpersonal, 
while emotional behavior need not be so. 
Some degree of acceptance is necessary for 
the interpersonal relationship to continue, 
while rejection leads to the cessation of the 
relationship. There is no interpersonal situ- 
ation without actual behavior, but an ideal 
can be maintained without reference to the 
other. Thus, the elements social, other, ac- 
ceptance, and actual, are more closely as- 
sociated with interpersonal behavior than 
the elements emotional, self, rejection, and 
ideal. 

The variable containing only interper- 
sonal elements is Variable I, 1, situated at 
the upper left corner of Table 1: actual 
social acceptance of the other by the other 
from the viewpoint of the other. The varia- 
ble containing none of these elements is 
VI,6: ideal emotional rejection of self by 
self from the viewpoint of self. It is pre- 
dicted that multiple correlation will be high 
for the most interpersonal variable and will 
decrease as one moves toward the least in- 
terpersonal one. To test this hypothesis the 
multiple correlation of each variable with 
the other variables of the same behavioral 


TABLE 4 
MULTIPLE CORRELATIONS OF THE 64 VARIABLES FOR THE OBSERVER WIFE 


Behavioral type 


Perceptual type 


1 2 3 4 5 6 1 8 
I 78 78 65 70 52 45 75 74 
II 77 75 73 72 49 47 63 61 
III 79 78 66 68 47 44 61 64 
IV 71 67 66 65 46 46 51 56 
V 68 63 56 55 51 48 47 52 
VI 70 63 56 58 49 44 47 51 
VII 71 68 57 60 52 51 61 64 
VIII 80 77 72 68 49 56 63 67 


Note.—Each variable is defined by a given profile of perceptual and behavioral type. 
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TABLE 5 
MULTIPLE CORRELATIONS OF THE 64 VARIABLES FOR THE OBSERVER HUSBAND 


Behavioral type 


Perceptual type ^ 


4 5 6 8 


I 78 74 63 
II 78 76 70 
III 76 71 68 
IV 67 63 67 
M 67 67 65 
VI 67 63 65 
VII 73 70 67 
VIII 82 77 66 


66 51 45 72 75 
68 52 39 67 68 
66 52 46 66 64 
63 46 33 58 59 
62 63 60 64 62 
67 59 59 50 52 
65 49 45 64 69 
64 52 53 68 73 


circumplex was computed. It would have 
been more appropriate to use all the re- 
maining 63 variables of the set in comput- 
ing each multiple correlation, but the la- 
bor involved would have been prohibitive. 
These multiple correlations are given in Ta- 
ble 4 for the observer wife and in Table 5 
for the observer husband. 


Fic. 3. A representation 


In Tables 4 and 5, the variables are ex- 
aetly in the same order as in Table 1, but 
now we can interpret this order as going 
from the most to the least interpersonal 
variable, according to the comnonent-like 
behavior of the facet elements (Foa, 1965). 
As predicted, the coefficient of multiple cor- 
relations follows this order closely: they 


1y, 
m/ 


[3 


of the ringex model. 
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are high in the top left corner and decrease 
moving to the right or down until the sixth 
column or row is reached, then start in- 
creasing again. Along the diagonals of each 
table this trend is even more regular. Since 
the more interpersonal variables have a 
multiple correlation higher than the less 
interpersonal one, their position in the con- 
figuration should be more central than the 
position of variables which are less specifi- 
cally interpersonal. 

In this manner we have gained some in- 
formation with regard to the configuration 
of the variables: it should be shaped in such 
a way as to permit a distinction between the 
center and the periphery. It should also 
account for the intersecting circles of eight 
variables which have been described previ- 
ously. The anchor ring or torus, a figure 
shaped like the inner tube of a car, a dough- 
nut, or a bagel, fulfills these requirements. 
In an anchor ring the points situated in the 
inner part of the surface, where the air 
valve is found in a car tube, are more cen- 
tral than those situated in the outer part. 
We have called a statistical structure of 
variables, arranged on the surface of an 
anchor ring, a ringex. The proposed ringex 
structure of our 64 variables is portrayed 
in Figure 3. 


Correspondence between Conceptual and 
Empirical Structures 


This figure is an attempt to represent the 
empirical relationship among the variables 
of Table 1. The large eight circles of the 
figure are the behavioral circumplexes and 
correspond to the rows of Table 1: each of 
these circles is made up of the same vari- 
ables which are found in a given row of 
Table 1. The small circles represent the 
order of perceptions, each one of them cor- 
responds to a given column of Table 1. 
Each behavioral circle crosses each per- 
ceptual circle once, and this point of inter- 
section defines the position of a given 
variable with respect to the other ones. It 
corresponds indeed to a cell of Table 1. 
Thus, Figure 3 depicts the proposed em- 
pirical interrelationship among the varia- 
bles while Table 1 depicts their conceptual 


relationship in terms of facet elements, It 
is remarkable how closely these two repre- 
sentations correspond to one another. In 
effect, the ringex structure can be obtained 
by folding Table 1 so that the bottom row 
will join with the top row and then folding 
it again in the other dimension so that the 
first and last columns will also become 
neighbors. Reversing the order of these 
two operations would have produced a 
different ringex, but data to be presented 
later tend to support the ringex of Figure 
3. 


Evidence for the Ringex Hypothesis 


If the ringex of Figure 3 is a correct 
portrayal of the interrelationship among 
the variables, it will also be expected that: 

1. The average correlation among vari- 
ables belonging to the same behavior circle 
will be highest for the circles in the internal 
portion of the surface and will decrease 
gradually as one moves toward the ex- 
ternal part. Internal circles are indeed 
smaller than the external ones. The smaller 
the circle the larger the correlation, on the 
average, among its eight variables. Cor- 
relation is inversely related to distance. 

2. The average correlation among vari- 
ables belonging to the same perceptual 
circle will always be higher than the aver- 
age correlation among the variables of 
any given behavioral circle. Perceptual 
circles are, in the figure, always smaller 
than even the smallest behavioral circle 
so that their variables should correlate 
higher. 

The average correlations for each circle 
and for each observer are given in Table 
6. 

For simplicity’s sake the circumplexes 
of Table 6 are all indicated by an Arabic 
numeral. It would have been more correct 
to denote a behavioral circumplex by the 
Roman numeral of the perceptual type 
which is constant in it, Both hypotheses 
appear well supported. The average cor- 
relations for the behavioral circumplexes, 
given in the first two columns of Table 6, 
are larger for the smaller circles found in 
the inner portion of the ringex (a large 
correlation indicates a small circle) and 
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decrease in size as one moves to the outer 
portion where circles are larger, On the 
other hand, the changes in average correla- 
tion of the perceptual circumplexes, last 
two columns of the Table, do not appear to 
be related to their order position. In fact, 
there is nothing in the ringex structure to 
suggest such a relationship. 

The lowest correlation of the pereeptual 
circles is .44; the highest one of the be- 
havioral ones, .31. Since a large correlation 
indicates a small circle, this finding sup- 
ports the suggestion of Figure 3 that the 
perceptual circles are always smaller than 
the smallest behavioral one. 

Different perceptions of the same be- 
havior are, on the average, more inter- 
related than the same perception of differ- 
ent behaviors. This may not be surprising 
considering that perceptual types relate 
aetual and ideal behavior from the point 
of view of self and other. Realizing that the 
perception or norm of the other is different 
from our own, or that there is à diserep- 
ancy between actual behavior and the 
corresponding norm may generate stress 
and set in motion mechanisms for re- 
ducing the gap. As a result, correlations 
between actual and ideal level, as well as 
between points of view, tend to be rela- 
tively high. Whether or not the correlations 
between actors will also be high, may de- 
pend on the relative power position of the 
two roles and on the culture. In the same 
culture, father and son, who have different 
power positions, are less related than hus- 
band and wife, but both these pairs of 
roles are more related in the American 
culture than in the Japanese one (Foa, 
Triandis, & Katz, in press). 

Differences among behaviors, on the 
other hand, may be less conducive to strain. 
Perceiving a discrepancy between status 
and love, between behavior toward self and 
other, and even between acceptance and re- 
jection, may not necessarily produce strain. 
In fact, these differentiations appear to be 
embedded in the cultural values, so that 
members of a certain culture are trained 
to make those differentiations which appear 
to be of particular significance to their 
Specific culture (Foa, 1964). Thus, differ- 


TABLE 6 
AYERAGE CORRELATIONS OF THE BEHAVIORAL AND 
PERCEPTUAL CIRCUMPLEXES FOR EACH 
OBSERVER (WIFE AND HUSBAND) 


: Behavioral Perceptual 

Circumplex ——————— ——————À 
Wife ^ Husband Wife Husband 

1 27 29 54 60 

2 25 28 44 53 

3 23 25 64 70 

4 21 24 59 65 

5 18 21 48 58 

6 18 23 54 58 

7 27 26 57 65 

8 28 31 47 58 


entiating among behaviors in a manner 
consonant to the cultural values may even 
facilitate the adjustment of the individual 
to this culture. This may explain why the 
perceptual correlations are larger than the 
behavioral ones. 

Another difference revealed by Table 6 
is that the correlations for the husband are 
always higher than the corresponding cor- 
relations of the wife, except in one case. 
Comparing the multiple correlations of hus- 
band and wife, of Tables 4 and 5, leads to 
the same conclusion: the wife has a finer, 
more differentiated picture of the relation- 
ship than the husband. It may be that the 
wife has fewer roles in society than the hus- 
band, so she can afford to “specialize” in 
the particular task of wife. The husband’s 
specialization may show itself in a greater 
differentiation among more different roles 
compensated by the avoidance of over- 
specialization in the specific role of hus- 
band. This explanation is consistent with 
certain other findings (Foa, 1964; Foa & 
Chemers, 1966), suggesting that there are 
limits to the ability of a person to differ- 
entiate between and within roles while 
maintaining self-identity. Thus, overdiffer- 
entiation in one area may be counter- 
balanced by underdifferentiation in another 
area. 


The Relationship among Behavior Circles 


The evidence presented so far provides 
some measure of support for the ringex 
model. Pursuing the empirical test further 
we attempt to predict the relationship be- 
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tween variables belonging to different be- 
havioral or perceptual circles. 

Consider the correlations among vari- 
ables belonging to any two behavioral 
circles: they show how the eight behavior 
types interrelate when the actor, level 
and/or alias change. Some of these correla- 
tions may be of particular interest. Criteria 
for judging whether a particular set of 
intercorrelations is interesting or not may 
vary, but one is likely to be more interested 
in the relationship between behavior types 
which differ in one perceptual facet only, 
rather than in those differing in two or 
three facet elements. Thus, for example, 
the relationship between the actual be- 
haviors of the two actors, or between actual 
and ideal behavior of the same actor, may 
be thought of as more interesting than, 
say, the relationship between the actual be- 
havior of one actor from his point of view 
and the ideal behavior of the other actor 
from the point of view of the other. The 
advantage of the model is that all these 
relationships are considered in a systematic 
manner, 

In Figure 3 the behavioral circles are 
shown as being roughly concentric, A vari- 
able in one circle appears to be nearest to 
the corresponding variable in the other 
circle. For example, among the variables of 
Circle VI, the one which appears nearest to 
Variable 1 of Circle V is precisely Variable 
1. If this representation is correct, any 
variable on a behavioral circle will corre- 
late highest with the corresponding vari- 


able on another behavioral circle; its cor- 
relation with the other variables of the 
second cirele will then decrease and in- 
crease again, following the usual circumplex 
model. 

Several examples of  intercorrelation 
matrices for various pairs of behavioral 
circles are given in the appendix (see Foot- 
note 3). They follow fairly closely the 
predicted pattern. The number of devia- 
tions is fairly small but tends to be larger 
when the behavior of the two actors is 
interrelated, as in Matrices IV-V, I-VIII, 
I-VI, and I-VII. One example of the inter- 
correlation between actors is given in Table 
7. This table relates the actual behaviors of 
husband and wife, as perceived by the hus- 
band from his point of view. In this par- 
ticular example, there is a rather large 
number of deviations, more than in most 
other matrices of this type; nevertheless 
the tendency toward an ordered pattern is 
readily apparent. This order suggests that 
what the husband gives (or denies) to his 
wife (in affect and status) is more related 
to what she gives (or denies) to him, than 
to what she gives to herself. On the other 
hand, what the husband gives to himself 
is more related to what the wife gives to 
herself than to what she gives to him. 
Within a given level, actual or ideal, be- 
haviors toward the other go together, as do 
behaviors toward the selves of the two 
actors. The relationship between self and 
other becomes different, however, when the 
two levels are compared as done in the cor- 


TABLE 7 
An EXAMPLE OF INTERCORRELATIONS BETWEEN Two BEHAVIORAL CIRCLES 
Perceptual Type I 
Behavioral 
Type Behavioral Type 

1 2 3 4 5 6 7 8 

Perceptual Type VII 1 60 55 39 29 — 23 13 29 31 
2 52 53 31 25 10 15 18 26 

3 4l 30 56 85 12 06 07 10 

4 38 22 39 68 08 01 03 09 

5 23 05 14 08 35 22 22 22 

6 22 21 —07 —04 23 37 29 34 

7 31 34 M -0 31 29 57 61 

8 42 45 18 05 30 26 48 55 
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relations between perceptual circles, dis- 
cussed in the next section. 


Relationship among Perceptual Circles 


Let us now turn to the relationship 
among variables appearing on two different 
perceptual circles, that is, the relation- 
ship between the various perceptions of 
iwo given types of behavior of the two 
actors. Some of these correlations relate 
the behavior of one actor toward himself 
to his behavior toward the other. The inter- 
correlations between the variables of per- 
ceptual Circles 2 and 3 indicate, for ex- 
ample, how emotional acceptance of self 
is related to feeling emotionally accepted 
by the other; Circles 1 and 4, on the other 
hand, relate giving status to self to receiv- 
ing status from the other. 

To develop hypotheses about the rela- 
tionship among perceptual circles, Figure 
3 will be used again. Consider two circles 
which are near to each other, Circles 1 and 
2, for example. From the analysis of the 
behavioral ecireumplexes we know that 
these two circles are very close, more so 
than some other contiguous circles like 
2-3, 4-5, and 6-7, When the two circles 
are near to each other the position of their 
respective variables is similar to the one 
found for two behavioral cireles. Any 
given variable correlates highest with the 
corresponding one in the other circle, so 
that the interrelationship between the two 
sets of variables should preserve the cir- 
cumplex pattern. This pattern is indeed 


apparent in the matrix of Table 8, giving 
the intercorrelations between the variables 
of perceptual Circles 1 and 2 for the hus- 
band. In this matrix there are 11 devia- 
tions from the prediction. The matrix for 
Circles 7 and 8, also very near to each 
other, is similar to the matrix of Table 8 
(see Appendix). 

Let us now consider two circles which 
are quite apart in the figure like Circles 
1 and 6, social acceptance of other and 
emotional rejection of self. Figure 3 sug- 
gests, for example, that Variable III of 
Circle 6 is nearer, or more related, to Vari- 
able VIII of Circle 1 than to its correspond- 
ing Variable III of the same Circle 1. In 
other words, the ideal self of the wife is 
more related to what she actually receives 
from her husband than to her norm of 
behavior toward her husband. The matrix 
of intercorrelations between the variables 
of Circles 1 and 6 (for the husband ob- 
server) is given in Table 9. Inspection of 
this matrix shows indeed that the above 
prediction is supported: the correlation be- 
tween III, 6 and VIII, 1 is higher, .21, 
than the correlation between III, 6 and III, 
1, .05. In general, in this matrix the highest 
correlations are around the border, and 
the correlations tend to decrease as one 
moves toward the center of the matrix. This 
intercorrelation pattern is similar to the 
one of Tables 4 and 5 for the multiple 
coefficients of correlation and is obviously 
different from the cireumplex pattern. In 
a circumplex the highest correlations are 


TABLE 8 
AN EXAMPLE OF INTERCORRELATIONS BETWEEN TWO PERCEPTUAL CIRCLES NEAR ro EACH OTHER 
Behavioral Type 1 
SEI Perceptual Type 
I Ir m IV M VI VII VIII 
Behavioral T. I voro idt Ae get iig aUe gSA? 
Mi EE II 61 72 55 47 41 41 43 55 
III 57 59 70 52 48 45 46 50 
IV 48 54 56 61 39 42 37 44 
V 26 34 33 29 61 38 32 41 
VI 39 40 40 38 40 60 50 52 
VII 52 43 43 37 38 53 64 64 
vill 56 54 44 36 39 AT 57 76 
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TABLE 9 
AN EXAMPLE OF INTERCORRELATIONS BETWEEN Two DISTANT PERCEPTUAL CIRCLES 


Behavioral Type 1 


Perceptual 
Ee I I In IV M VI VII VIII 
Behavioral Type 6 I 15 15 09 11 06 06 13 18 
II 14 13 12 11 04 02 14 14 
IlI 14 13 05 08 01 01 06 21 
IV 09 09 05 06 —01 05 06 12 
v 18 16 10 12 00 05 10 13 
VI 25 20 16 16 04 06 10 17 
VII 22 16 09 08 04 07 09 17 
VIII 25 18 15 15 04 09 12 17 


found near the main diagonal; here they 
are found around the borders of the table 
and the correlations decrease as one moves 
toward the center of the table. The more 
interpersonal the two correlated variables, 
the higher their coefficient of correlation. 
"The correlation pattern of Table 9 indicates 
that ideal-self-rejection correlates highest 
with laek of actual acceptance from the 
other. Actual behavior of the other toward 
the self influences both the actual and the 
ideal self, while the norm of the other ap- 
pears to play a minor role. Actual be- 
havior is more interpersonal than ideal 
behavior; the above results support, there- 
fore, the predietion that the relationship 
between actual and ideal behavior should 
be higher than between different kinds of 
ideal behavior. 

We have considered two extreme ex- 
amples of relationship between perceptual 
circles. When the two circles are very near 
to each other, as 1 and 2, their intercorre- 
lations approximate the cireumplex pattern, 
each variable tends to correlate highest with 
the one that corresponds to it on the other 
circle. When the two circles are distant, as 
l and 6, the correlations pattern tends 
toward the single-factor model of Spear- 
man (1927); in Spearman, the single fac- 
tor is general intelligence, here it is the 
interpersonality of the two variables. A 
less interpersonal variable in one circle 
will correlate more with a strongly inter- 
personal one in the other circle than with 
its corresponding variable. So it happens, 
in the example given above, that ideal self 
is related to actual other more than to ideal 
other. 


The ringex model suggests that the effect 
of interpersonality becomes stronger with 
the increase of the distance between 
the two intercorrelated perceptual circles. 
Therefore, it may be expected that, as 
distance increases, the correlation matrix 
will move away from the circumplex pat- 
tern toward the single-factor pattern. 

This proposed interplay of interperson- 
ality with the cireumplex pattern may be 
illustrated by an example which is of some 
interest in terms of social exchange. Let 
us consider the correlation of Variable VII 
from one circle with Variables I and VII 
from another one. Variable I is more inter- 
personal than VII; if the correlation is 
determined by the interpersonality of the 
two variables then: r(I-VII) » r(VII- 
VII). The circumplex model, on the other 
hand, leads to the opposite prediction: 
Variable VII should be closer to the corre- 
sponding VII, on the other circle, than to 
I. Taking now the variables from Circles 
1 and 4, which are rather distant, support 
the first prediction, interpersonality gains 
the upper hand. The second prediction, fol- 
lowing the cireumplex model, is however 
supported when the variables are taken 
from the two contiguous circles, 2 and 3. 
"The respective coefficients, for the observer 
husband, are: r(VII, 4, 1) = .38; r(VII, 
4-VII, 1) = .30; r(VII, 3-1, 2) = 30; 
r(VII, 3-VII, 2) = .45. For the wife the 
coefficients are closely similar. 

When these results are translated into 
plain English, with the help of Table 1, 
they read as follows: 

1. Giving oneself status (VII, 4) is more 
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related to receiving status from the other 
(I, 1) than to giving it to him (VII, 1). 

2. Giving oneself love (VII, 3) is more 
related to giving love to the other (VII, 2) 
than to receiving it from him (I, 2). 

These results suggest that the economics 
of status and love exchange are somewhat 
different. In love, the more one gives to the 
other the more he has for himself. To have 
status, however, one has to receive it from 
the other. 

Several examples of  intercorrelation 
matrices between perceptual circles are 
given in the appendix (see Footnote 3). In 
general they tend to follow the proposed 
pattern. There are, however, several devia- 
tions and also some systematie features 
which require further study. In analyzing 
matrices relating the behavior toward the 
other to the behavior toward self, as for 
example, the matrix of Table 9, one should 
remember that when the behavior is toward 
the other there is a gap between Types IV 
and V. When the behavior is toward the 
self the gap is between Types I and VIII. 
This has the effect of producing a certain 
shift in the relationship between the vari- 
ables of the two circles. 

The evidence presented in the earlier 
sections clearly supports the ringex model. 
The intercorrelations among perceptual 
circles are less conclusive to this regard 
and further analysis may lead to some 
modifications in the proposed model. 


Relationship between Observers 


So far we have been concerned with the 
relationship between variables as perceived 
by the same observer, the husband or the 
wife. It has been suggested that these vari- 
ables are organized in a cognitive pattern 
which has been called the ringex. To gain 
some understanding of the relationship 
between the two observers, let us now 
intercorrelate the variables of a circle from 
the husband’s ringex with the variables of 
the corresponding circle from the wife’s 
ringex. 

An example of the intercorrelations 
among the behavioral types of the two 
Observers is given in Table 10. It refers to 
the actual behavior of the husband as 
pereeived (by the two observers) from his 


point of view. This table shows how the 
husband's perception of his actual behavior 
toward his wife and toward himself is 
related to the wife's perception of the 
same behavior. Both observers take the 
point of view of the husband. The inter- 
eorrelation pattern of Table 10 tends to- 
ward the circumplex, in spite of the 10 
deviations found in it. Some of the devia- 
tions are due to the fact that certain 
diagonal entries are lower than expected: 
this feature suggests a tendency toward 
the single-factor pattern. Indeed the rela- 
tionship between the wife's perception of 
the husband's behavior toward himself and 
the corresponding perception of the hus- 
band is sometimes lower than expected in 
a circumplex, but still higher than in the 
single-faetor pattern. 

The effect of the interpersonality of the 
variables is more apparent in the inter- 
correlations among the perceptual types 
of the two observers, for a constant be- 
havior type, given in Table 11. Type I for 
one observer corresponds to Type VIII for 
the other observer, Type II, to Type VII 
and so on. This correspondence can be 
checked by using the facet definition of 
these types in Table 1. Thus, to place the 
correlation between corresponding types in 
the main diagonal of the table the order of 
one observer (the wife) had to be reversed. 

This table relates the perceptions of the 
two observers of their mutual social accept- 
ance, actual and ideal. The coefficients tend 
to be higher at the four corners of the 
table and to decrease toward the center, 
following the single-factor pattern. Thus, 
the highest correlations are between the 
perceptions of actual behaviors. The norm 
of one observer, on the other hand, tends 
to be associated with the other observer's 
perception of actual behavior, at least as 
much as with the other observer's norm. 

These results suggest that the relation- 
ship between observers tends to follow the 
same pattern as within one observer, but 
the distance being larger, as shown by 
the smaller correlations, the effects of inter- 
personality tend to become stronger than 
within the same observer. Apparently, each 
observer tends to infer the other's percep- 
tion of less interpersonal behaviors, such 
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TABLE 10 


INTERCORRELATIONS AMONG THE BEHAVIORAL TYPES OF THE Two OBSERVERS (Acron: HUSBAND; 
LEVEL: ACTUAL; ALIAS: HUSBAND) 


Observer wife 
Behavioral 
PR 1 2 3 4 5 6 7 8 
Observer husband 1 32 24 12 08 —05 03 22 22 
2 26 21 09 04 05 00 20 20 
3 21 16 32 23 04 06 00 02 
4 08 05 23 19 08 10 —03 —02 
5 04 04 10 14 18 06 10 06 
6 11 08 05 06 13 13 13 14 
7 26 21 01 —01 03 14 37 30 
8 33 29 07 —02 01 09 29 32 


as ideal behavior and behavior toward one- 
self, from the more interpersonal ones. 
More interpersonal behavior appears to be 
more visible or more overt than the less 
interpersonal one. 

All this suggests that there are two ways 
for building an image of the private picture 
of the other. One possibility is to use as a 
point of departure our own private picture: 
then covert behavior will be related to 
covert behavior more than to overt behavior 
and this will produce a circumplex pattern. 
Another possibility is to infer the private 
world of the other from his overt behavior: 
then the relationship between covert and 
overt will be higher than between covert 
and covert behaviors; this will produce a 
single factor pattern. When both ways are 
used the pattern may be somewhere between 
these two ideal types. Some of our results 
also suggest that the observer’s own self is 
influenced by the overt behavior of the 


other toward him, so that in this case the 
intercorrelations will again approach the 
single-factor pattern. The ringex model at- 
tempts to predict which pattern can be ex- 
pected in each particular case. 


Reliability of the Data 


To assess the reliability of some of the 
present findings, one can compare them with 
the results of some earlier investigations. 
Some indirect support for the hypothesis 
of order of the perceptual types is provided 
by a study of the foreman-worker role 
(Foa, 1958), using the same perceptual 
facets as in this study, but an entirely 
different technique of observation: pictures 
rather than a questionnaire. 

With regard to the circular structure of 
the behavioral types, there is quite a num- 
ber of studies (for a review, see Adams, 
1964, and Foa, 1961) pointing in the same 
direction. The above studies employed 


TABLE 11 
INTERCORRELATIONS AMONG THE PERCEPTUAL TYPES or THE Two OBSERVERS (BEHAvIoRAL Tyre 1? 
SOCIAL ACCEPTANCE OF OTHER) 


Observer wife 
Pei 

"es 
Vi VII VI v IV n II I 
Observer husband I 32 26 19 14 18 27 32 31 
II 31 28 25 17 23 24 30 26 
IH 26 25 25 15 18 22 23 23 
IV 23 19 24 19 21 19 21 18 
M 17 14 18 12 22 20 22 , 19 
VI 20 20 19 17 20 23 26 26 
VII 22 16 12 10 19 28 32 33 
VIII 29 21 17 12 23 32 34 32 
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techniques of observation different from 
those used in this study. This will answer 
the possible criticism that the regularity 
of the results may be due to some artifact 
of procedure in gathering the data. There 
is nothing in the procedure used which adds 
factual support to such a suspicion. The 
only instance in which the order of the 
questions is identical with the order of the 
variables is in perceptual Types II-III and 
VI-VII, and these types do not correlate 
higher than other contiguous types which 
were observed by nonconsecutive ques- 
tions. 

With regard to the ringex structure as 
a whole, there is certainly need for further 
validation. The results obtained seem, how- 
ever, to be good enough for accepting it as 
a point of departure for future investiga- 
tions. 


Discussion AND CONCLUSION 


We have attempted to present a picture 
of the cognitive organization a person has 
of his relationship to another person in 
reciprocal roles. A beginning has also been 
made in relating the picture of one person 
to the pieture of the other one. The pro- 
posed cognitive organization is essentially 
based on two rationales: the develop- 
mental and the interactive ones. The ringex 
suggests which of the two will be prevalent 
in determining a particular interrelation- 
ship pattern. The developmental rationale 
proposes that the relationship among vari- 
ables is determined by the manner in which 
these variables become differentiated dur- 
ing the psychosocial development of the 
child. We have seen that this rationale 
leads to the prediction of the order of 
contiguity of the variables: usually a cir- 
cumplex order and, more rarely, a simplex 
one. The interactive rationale, on the other 
hand, proposes that the relationship be- 
tween variables is determined by the inter- 
personal situation, here and now. Accord- 
ing to this rationale the variables can be 
Ordered from the most interpersonal or 
Overt to the least interpersonal or covert. 
lt follows that a covert variable will be 
more closely associated with an overt one 
than with another covert variable. In this 


context the notion of maturing may be 
understood as moving away from the or- 
ganization pattern resulting from the de- 
velopment sequence, toward a pattern more 
attuned to the realities of a specific inter- 
personal situation. 

This same issue of development versus 
interaction is also present in the organiza- 
tion of family roles (Foa, Triandis, & 
Katz, 1966). It has been suggested, for 
example, that the role of husband toward 
wife is modeled on the role of son to 
mother. To what extent will the husband’s 
role remain similar to the son’s role or will 
be influenced by the behavior of the wife 
toward the husband? The structure of the 
family roles suggests the behavioral in- 
fluence may be stronger when the two 
reciprocal roles occupy similar power posi- 
tions. 

This problem is intimately related to the 
question of whether interpersonal behavior 
ean be changed and how. Conditioning 
therapy appears to be close to the inter- 
active viewpoint, as it is in any other “here 
and now” therapy. Psychoanalysis puts 
more stress on the developmental aspect: 
the developmental process has to be some- 
how reexperienced in order to obtain a 
change in behavior. Our data seem to in- 
dicate that both development and inter- 
action influence behavior and may provide 
some pointer in suggesting which com- 
bination of techniques may be best for 
obtaining a certain behavioral change. It 
may become possible to do so when a num- 
ber of research problems suggested by the 
present findings are solved. Some of these 
problems will be briefly outlined here 
below. 


Further Research 


Even a modest advance in the under- 
standing of interpersonal behavior is un- 
usually suggestive of new research. This 
may be due to the fact that interpersonal 
behavior is at the crossroad of several 
areas of psychology, relating, as it does, to 
child development and cross-cultural re- 
search, as well as to role analysis and 
clinical psychology. The role that interper- 
sonal behavior may play in the theoretical 
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integration of these different areas can be 
exemplified by the following examples of 
research problems generated by the present 
findings. 

1. The data of this study refer to a 
sample of a “normal” population of married 
couples. It becomes of interest to investi- 
gate how various kinds of psychiatric cases 
differ from normals and the effects of 
therapy on these differences. It has been 
suggested that the average frequency of 
types of behavior may differ (Adams, 
1964). It is now proposed that such differ- 
ences may also be found in the size of the 
correlation coefficients among variables, 
while the order position of the variables 
in the structure may remain invariant. The 
correlation between certain variables may 
be higher in data obtained from psychiatric 
cases than for normals in a given kind of 
disorder and lower than for normals in 
another kind of disorder. If some correla- 
tions increase (or decrease) as compared to 
normals, some other correlations will de- 
crease (or increase) in the same kind of 
disturbance. It is proposed, in other words, 
that in psychiatric patients variables will 
maintain the same order as in normal in- 
dividuals but will change their respective 
distance or degree of differentiation: some 
variables will approach certain others while 
moving away from other variables. Some 
findings of other investigators relating, for 
example, to differences between actual- 
self- and ideal-self-perception among nor- 
mals and various kinds of mental patients 
seem to point in this direction. This type 
of investigation may ultimately lead to a 
new typology of behavior disturbances, 
based on structural differences, and rela- 
tively simple tests for differential diagnosis. 

2. Changes in degree of differentiation 
within the same order may also be found 
in different cultures, Results supporting this 
hypothesis in two cultural groups are re- 
ported in another paper (Foa, 1964). These 
changes in differentiation from one culture 
to another one seem to be systematic and 
related to the value system of the culture. 
Some preliminary evidence Suggests that 
cross-cultural differences in the differentia- 
tion of interpersonal behavior leads to ten- 
sion and conflict when persons from dif- 


ferent cultures have to cooperate in a 
common task or to negotiate. It has been 
found possible to reduce heterocultural 
strain by training a person from one cul- 
ture to make the interpersonal differentia- 
tions that are required in the other cul- 
ture (Foa & Chemers, 1966). 

3. The ringex structure refers to a given 
pair of reciprocal roles. It appears of in- 
terest to investigate the relationship among 
the various roles of a person, both from a 
developmental and a structural point of 
view. It is commonly accepted that the 
different roles one has to play in adult 
life originate from a small number of roles 
developed in early childhood. It has often 
been suggested that one of the ways in 
which a child acquires new roles is by 
taking the role of the other, It is a common 
experience to see young girls playing the 
role of the mother toward a doll, while 
the doll is, so to speak, playing the role of 
the girl. This exchange of roles, which has 
been so far described intuitively, finds a 
precise expression in the ringex structure: 
the new role is created by interchanging 
the elements of the actor facet in the old 
role, The actor “nonobserver” of the old 
role becomes the actor “observer” of the 
new role and vice versa, Thus the new role 
will, at least at the beginning, look like the 
mirror image of the old one. These con- 
siderations have a direct bearing on the 
problem of ordering different roles in a 
contiguity pattern. A beginning in this 
direction has been made by a study of 
cross-cultural invariance in the organiza- 
tion of the roles of the family system 
(Foa, Triandis, & Katz, 1966). It remains 
now to investigate the organization of roles 
in other social systems such as the school, 
work, religion, and their relationship to 
family roles. The work done on the family 
roles indicates that the semiringex of one 
actor will be near to the semiringex of the 
other actor, as in our case, only when the 
two actors occupy similar power positions, 
as husband and wife. When the power is 
different, as in father and son, each one of 
these two roles relates more to some other 
role than to its reciprocal role. This sug- 
gests that, when dealing with all the roles 
of a social system, it may be more con- 
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venient to consider, as a unit of analysis, 
the semiringex of each role rather than the 
ringex of two reciprocal roles. 

4. The sequence of differentiation of 
interpersonal eoncepts, which has been pro- 
posed here, needs to be tested through the 
investigation of the cognitive organization 
of children of different ages, Such a testing 
requires, however, some further theoretical 
work. Two sequences have been presented 
here: one for the behavior facets and the 
other for the perceptual facets. A third 
sequence, differentiating between the facets 
of family roles (and having the actor facet 
in common with the perceptual sequence 
given here) has been described by Foa, 
Triandis, and Katz (1966). Obviously these 
three sequences, involving eight facets, have 
to be brought together in a single develop- 
ment pattern which may account for the 
organization within and between family 
roles. 

Some findings (Foa, 1964) suggest that 
differentiation at a given stage of the 
sequence may be more or less strong 
depending on the value system of the cul- 
ture of the child. The culture is, of course, 
mediated to the child through his im- 
mediate social environment, the family, 
and peer groups. It becomes then of im- 
portance to know how efficient is the en- 
vironment in producing in the child the 
degree of differentiation prescribed by the 
culture. If too much or too little differen- 
tiation occurs in the cognitive development 
of the child this may result in maladjustive 
interpersonal behavior. The possibility that 


behavioral problems may be related (o de- 
Viations from the degree of differentiation, 
which is normal in a particular culture, has 
been discussed earlier. 

The theoretical approach underlying 
these proposals may be summarized as fol- 
lows: 

a, The sequence of differentiation in the 
child and the resulting cognitive organiza- 
tion of interpersonal behavior in various 
roles of the adult are cross-culturally in- 
variant. 

b. The degree of differentiation at a 
given stage is prescribed by the culture 
and may thus vary from one culture to 
another one. 

c. When, as a result of social environ- 
mental conditions in childhood or other 
factors, the differentiation actually made 
is lower or higher than the culturally pre- 
scribed one, and thus differs from the one 
of most other individuals, maladjustive be- 
havior may occur. 

d. A similar maladjustive situation oc- 
curs when an individual is operating in a 
culture other than his own. Here, however, 
his cognitive organization differs from the 
prevailing one because his culture is dif- 
ferent and not because he deviates from his 
own culture. 

These considerations suggest that the 
study of the differentiation and organiza- 
tion of interpersonal behavior in different 
roles may provide a focus for the theoreti- 
cal integration of concepts from such dif- 
ferent areas as developmental, cross-cul- 
tural and abnormal psychology. 
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INTELLECTUAL ABILITIES OF SYMBOLIC AND 
SEMANTIC JUDGMENT’ 


RALPH HOEPFNER, KAZUO NIHIRA, Aw» J. P. GUILFORD 
University of Southern. California 


2 studies approached the problem of describing judgmental processes from 
the standpoint of individual differences in terms of basic traits. Based upon 
Guilford’s structure-of-intellect model, the factors of symbolic and semantic 
evaluation were hypothesized to exist as distinct from one another and also 
from factors represented in other domains of the model, Experimental tests 
were developed as measures of the hypothesized factors. Measures of refer- 
ence factors were also employed to demonstrate the uniqueness of the 
hypothesized factors, The tests were administered to 2 samples of high- 
school students, scores were factor analyzed, and axes analytically rotated, 
resulting in the demonstration of the 12 hypothesized evaluation factors and 
all the reference factors as uncorrelated dimensions of intellectual ability. 
The conclusion is that the model has continued to lead fruitfully to undis- 
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covered, differentiable intellectual aptitudes. 


Forme several successful attempts 
to validate Guilford's theory of intel- 
ligence, in which intellectual aptitude 
factors of creative production, problem 
solving, and symbolic thinking were iso- 
lated and investigated, the attention of the 
‘Aptitudes Research Project turned to the 
intellectual operation of evaluation. Other 
operations—operations being one of the 
three facets of Guilford’s model—had 
received attention previously; divergent 
production in several studies of creative po- 
tential; and cognition and convergent pro- 
duction in studies of problem-solving and 
symbolic factors. 

In the most recent explication of his 
complete model, Guilford and Merrifield 
(1960) define the operation of evaluation 
as reaching decisions or judging on the 
basis of goodness according to certain cri- 
teria. Although inclusion of this operation 


1The studies reported herein are two in a 
series conducted by the Aptitudes Research Pro- 
ject at the University of Southern California, 
under Contract Nonr-228(20) with the Office of 
Naval Research, Personnel and Training Branch. 
The ideas expressed do not necessarily reflect the 
views of that agency. The authors wish to extend 
their special thanks to Philip R. Merrifield, who 
aided greatly in the planning and the supervision 
of the construction and administration of the 
test batteries. 


in the model of intelligence was somewhat 
theoretical, there was a history of discov- 
ery of evaluation-like factors, defined 
by tests that did not correlate highly with 
cognition and production tests, but which 
formed weak factors of their own, instead. 


A HISTORY or EVALUATION Factors 


The first of these factors can be attrib- 
uted to L. L. Thurstone (1938a) when he 
jdentified the factor that became known as 
“perceptual speed,” and that later was 
recognized as an evaluation factor—the 
evaluation of figural units (EFU). The 
factor has been most consistently defined 
by tests that require the rapid comparison 
of figural objects with judgments of iden- 
tity versus nonidentity (Guilford & La- 
cey, 1947).But there have been times 
when there was uncertainty as to whether 
i& also applied to tests requiring the iden- 
tification and matching of letters, numbers, 
and words (Coombs, 1941; Thurstone, 
1938b). Thus, the perceptual-speed factor is 
of interest, since the structure-of-intellect 
model forecasts a distinct but parallel abil- 
ity concerned with the identity versus non- 
identity of literal material, and tests of 
EFU could serve as models for the hypoth- 
esized evaluation of symbolic units (ESU) 
factor, which heretofore had not been 
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clearly demonstrated. One or two studies 
cited by French (1951) gave some hope of 
such differentiation, for example, Bechtoldt 
(1947), but there was nothing decisive. 

The factor of “judgment” was found in 
the Army Air Force research during World 
War II (Guilford & Lacey, 1947). It was 
largely identified by its association with a 
test called Practical Judgment, which was 
composed of verbally stated problems or 
predicaments of a common everyday type. 
Multiple-choice answers offered different 
more-or-less plausible solutions, the ex- 
aminee (E) to select the best alternative. 
A similar factor has been identified as 
judgment in each of three studies by the 
Aptitudes Research Project (Berger, Guil- 
ford, & Christensen, 1957; Guilford, Green, 
Christensen, Hertzka, & Kettner, 1954; 
Kettner, Guilford, & Christensen, 19592), 
sometimes with Practical Judgment as a 
marker test, sometimes not. 

"Speed of judgment" was first identified 
as a factor by Thurstone (1944) in a study 
of perception. The tests which defined this 
factor involved time scores of making 
choices of color versus form (in classify- 
ing figures), of desirability of personality 
traits, and of weights involving the size- 
weight illusion. The perceptual nature of 
this factor indicated its resemblance to a 
factor called “perceptual evaluation,” iso- 
lated later in a factor-analytic study of 
evaluative abilities by Hertzka, Guilford, 
Christensen, and Berger (1954). The lead- 
ing test, for this factor involved the ability 
to judge rapidly the length of pairs of 
lines or the size of given figures. Thus, the 
factor was defined as the ability to ap- 
praise rapidly the similarities and differ- 
ences among simple perceptual materials. 
Both “speed of judgment" and “perceptual 
evaluation” factors are primarily con- 
cerned with this ability. It seems that 
these factors are quite similar to the well- 
known factor of “perceptual speed,” which 
has to do with judgments of identity ver- 
sus nonidentity of figural material. The 
authors are not aware of any systematic 
studies which investigated the equivalence 
or differences among those factors of per- 
ceptual evaluation. 


In the same factor-analytic study 
(Hertzka et al., 1954), a factor called 
“speed of evaluation” was isolated. The 
leading tests for the factor involve judg- 
ing whether or not named objects satisfy 
simultaneously certain specified criteria, 
such as roundness and hardness. In such 
tests, decisions are easy, so that speed be- 
comes an important variable, hence the 
naming of the factor. Prior to Hertzka’s 
study, Bechtoldt (1947) independently 
identified a similar factor which was later 
called “speed of association” by French 
(1951). The tasks involved in the tests 
that defined the speed-of-association fac- 
tor are essentially conceptual or verbal 
rather than perceptual evaluation, since 
the tests deal with the meanings of words, 
objects, and sentences. Thus, the speed-of- 
association factor is probably equivalent 
to Hertzka’s speed-of-evaluation factor. 

The factor of “logical evaluation,” also 
formerly called “logical reasoning,” has 
appeared in a number of analyses (Frick, 
Guilford, Christensen, & Merrifield, 1959; 
Green, Guilford, Christensen, & Comrey, 
1953; Guilford et al., 1954; Hertzka et al., 
1954; Kettner et al., 1956). The most con- 
sistent tests identifying the factor were in 
the form of multiple-choice syllogisms and 
variations of tests in the syllogistic cate- 
gory. The factor has been considered eval- 
uative because in these tests E is not 
required to produce conclusions but to eval- 
uate conclusions presented to him. There 
was a seemingly parallel factor, identified 
most consistently by a test called Symbol 
Manipulation, which was essentially syl- 
logistic in form but with letter symbols 
rather than words. This is the nearest that 
research has come to demonstrating pre- 
viously a symbolic-evaluation factor. 

The factor of “experiential evaluation” 
has been defined chiefly by reference to 
the test that has most strongly represented 
it in two analyses—Unusual Details 
(Hertzka et al., 1954; Marks, Guilford, & 
Merrifield, 1959). The test presents pic- 
tured situations, with E to state two 
things he sees wrong with each picture. 
The “things wrong” may be contrary to 
past experience or to other items of in- 
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formation within the pietures. The inter- 
pretation and naming of the factor have 
placed emphasis upon the obvious role of 
past experience, without much considera- 
tion that most judgments (and other men- 
tal operations as well) depend upon past 
experience. 

“Sensitivity to problems” was first hy- 
pothesized as an important ability con- 
tributing to creative thinking (Wilson, 
Guilford, Christensen, & Lewis, 1954) and 
was demonstrated as a factor. It has 
since been verified as a factor in other 
studies (Kettner et al., 1959a, 1959b; 
Marks et al., 1959; Merrifield, Guilford, 
Christensen, & Frick, 1962). The factor 
was defined as the “ability to see defects, 
needs, and deficiencies,” and was thought 
to belong in the evaluative domain, with 
the conjecture that simply being aware of 
the existence of problems, or aware that 
things are not all right, is an instance of 
evaluation (Guilford, 1959). 


DEFINITIONS 


The review of these known factors en- 
abled the writers to draw some inferences 
that were logically helpful. The two major 
deductions have to do with the fact that 
evaluation is a multiple-dimensional af- 
fair, and with the nature and definition of 
evaluation. With regard to the multidi- 
mensional hypothesis, the indication is 
that there are as many as five verbal- or 
semantic-evaluation factors. The same 
hypothesis is supported by the recognition 
of as many as three nonverbal evaluation 
factors. Thus, the way seemed open for à 
considerable number of differentiable 
evaluative abilities. 

The conclusion concerning the definition 
of evaluation itself could not be so uni- 
vocal as that concerning the existence of 
multiple factors. An empirically based def- 
inition is best achieved from examina- 
tion of the factors and of their tests. As 
the factors discussed above are considered, 
one finds that the crucial type of activity 
can be variously described. Perhaps the 
most general property of evaluation tests 
is that they require decisions. Multiple- 
choice tests require decisions among al- 


ternative answers. Other tests require es- 
sentially yes-no decisions. As many testers 
well know, a multiple-choice test can also 
involve a set of yes-no decisions, with each 
alternative answer (together with the 
item stem) being a true-false item. In a 
multiple-choice syllogism test, either of 
verbal or symbolic content, H may make 
a yes-no decision regarding every alterna- 
tive answer; it does or it does not follow 
logically from the premises. 

Still other factor tests emphasize sen- 
sitivity, in which E has to detect defects, 
or deficiencies. The tests usually are in 
completion form, and no choice is offered 
among alternatives. The sensitivity inter- 
pretation can be applied meaningfully in 
syllogistie tests, in which detection of 
logical errors or discrepancies is needed. It 
might be applied to tests involving yes-no 
decisions regarding identity of pairs of ob- 
jects or series of symbols—a sensitivity to 
differences or discrepancies—and to tests 
involving the criterion of satisfaction of 
specified criteria, in the form of detection 
of failure to meet the criteria. 

The issues just raised were not fully 
realized or fully met at the time part of 
this study was planned, but they were 
given attention in connection with the 
study of symbolic-evaluation abilities. It 
turned out that in the semantic study 
there was a bias in favor of the hypothesis 
that evaluation involves decisions among 
alternatives, and most tests selected or de- 
veloped for that study are of multiple- 
choice form. Thus, relative judgments were 
emphasized rather than yes-no judgments 
or detection of things wrong. Several 
kinds of criteria were recognized and uti- 
lized, however, including identity, suitabil- 
ity, or effectiveness of given information 
for certain purposes. 

In formulating a definition of “evalua- 
tion” for the symbolic study, several al- 
ternatives were considered. Two of them 
will be mentioned, since they have a direct 
bearing on the distinction between “sensi- 
tivity” tests and “estimation” tests. We 
shall then state a broader definition that 
embraces both conceptions. 

In a definition that equates evaluation 
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to "sensitivity to error," the term "error" 
is interpreted broadly to inelude any kinds 
of defects, deficiencies, departures, incon- 
sistencies, incongruities, etc. This view im- 
plies absolute judgments: to each individ- 
ual a thing can be judged as being right 
or identieal with another (within tolerance 
limits upon those variables thought to be 
relevant), or it is not. Some individuals de- 
tect such "errors" with low tolerance for 
deviations with respect to relevant infor- 
mation where others cannot do so. This is 
not to say that there is a dichotomy of 
individuals; they can still vary by small 
degrees along a continuum of greater or 
less sensitivity. 

In the definition that emphasizes “es- 
timation,” it is implied that individuals 
also make relative judgments. When items 
of information fall short or deviate from a 
standard, one may deviate farther than 
another, It may be obvious that all the 
items of information depart from the 
standard, but which one deviates least? 
Where sensitivity tests typically call for 
absolute judgments of a yes-no, disjunc- 
tive type, estimation tests typically offer 
alternative items of information and ask 
which one deviates least, or sometimes (but 
rarely) which one deviates most. A ranking 
of alternatives is implied. This view of 
estimation is concordant with that offered 
by Johnson (1955). 

Actually, the two views can be brought 
logically under the same definition of 
evaluation. In both cases, a standard of 
some kind is implied. In both cases, cri- 
teria for judgment are implied. A defini- 
tion embracing both views would read: 
Evaluation is a matter of decision concern- 
ing eriterion satisfaction. This is the defini- 
tion adopted as the basis for planning 
these studies. One of the objectives was to 
determine whether tests embodying the 
sensitivity prineiple and those embodying 
the estimation principle would both in- 
dieate the same kinds of ability, thus jus- 
tifying subsuming them both under a sin- 
gle definition of evaluation. 

When it is said that evaluation is con- 
cerned with criterion satisfaction, it is then 
necessary to give attention to what kinds 
of criteria are suitable for use in tests of 


evaluation abilities. Some of the tradi- 
tional criteria have been: identify versus 
deviation, completeness versus incomplete- 
ness, compatibility versus incompatibility, 
congruity versus incongruity, effectiveness 
versus ineffectiveness, and suitability ver- 
sus unsuitability. For testing purposes, a 
criterion must be of a type that can be com- 
municated to the examinee. As will be seen 
in discussions of the results, some additional 
kinds of criteria were included in these 
studies, such as popularity versus unpopu- 
larity (frequency of usage) and highly 
probable versus improbable. With sym- 
bolic and semantic materials to be evalu- 
ated, the use of either esthetic or moral 
criteria can be and was avoided. Questions 
regarding those two kinds of criteria are 
more likely to arise with greater urgency 
in analyses pertaining to figural informa- 
tion on the one hand and behavioral in- 
formation on the other. 

Inferred in the preceding discussion of 
evaluation is the fact that the study of 
evaluation abilities was carried out in two 
parts, a symbolic study (Part 1) and a 
semantic study (Part 2). Such a split pro- 
gram was necessary due to the amount of 
testing time available for any one group 
of examinees. 


Tur Expecrep Factors AND Turm TESTS 


During the late 1950’s there was devel- 
oped a theoretical model of intellectual 
abilities called the “structure of intellect” 
(Guilford & Merrifield, 1960), designed to 
bring into a single systematic classification 
the intellectual factors then known. In the 
model the primary intellectual abilities 
are classified in terms of three major di- 
mensions: operations, contents, and prod- 
ucts. There are five kinds of operations 
which the organism can perform: cogni- 
tion, memory, divergent produetion, con- 
vergent production, and evaluation. There 
are four kinds of contents, or broad cate- 
gories of information, upon which the 
organism can perform the operations: fig- 
ural, symbolic, semantic, and behavioral. 
The third dimension of the model repre- 
sents the six kinds of products: units, 
classes, relations, systems, transformations, 
and implications. The products are the re- 
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sults of the organism’s psychological proc- 
essing of information. 

The six kinds of products of the model 
have been defined in a number of places, 
but will be very succinetly defined here. 
Units are segregated items of information 
having thing character, Classes are groups 
of items of information having common 
properties. Relations are meaningful con- 
nections between units. Systems are or- 
ganized or structured complexes of units 
and relations. Transformations are changes 
or redefinitions of known items of infor- 
mation. Implications are in the form of 
expectancies, predictions, or consequences 
of information. 

An objective of some importance in 
these studies is that they represent further 
attempts to validate Guilford’s model as a 
source of hypotheses for defining and iso- 
lating factors of human intelligence. If the 
model continues to generate concepts 
which can be found to represent unique 
abilities, then its contribution to psycho- 
logical explanation and prediction will be 
further substantiated. 

Not only is the existence of new factors 
deduced from the model, but the model 
further offers operational specifications for 
the measures needed for the factors. Such 
measures, which serve as the “empirical 
world” for verification of the factors and 
the model, are then ayailable as instru- 
ments for study of the traits in new in- 
vestigations involving those traits. Today’s 
new factors and their experimental tests 
become tomorrow’s reference concepts and 
marker tests for use in other kinds of in- 
vestigations. They also become available 
for applied psychological prediction and 
selection, which is the ultimate social- 
value testing of the model itself. 

The studies reported here attempted to 
explore intensively the 12 factors of sym- 
bolic and semantic evaluation hypothe- 
sized from the model. The hypothesized 
factors were deduced from the model, and 
the hypotheses were to be tested by factor 
analysis. Previous studies at the Project 
have shown the fruitfulness of such a re- 
search strategy. x 

Since most test responding can be viewed 
as a problem-solving process, it is worth- 


while to consider the nature of evaluative 
processes in terms of a problem-solving 
model. Such considerations will lead to a 
rationale for developing the operational 
specifications for separating evaluative 
abilities from the rest of the problem-solv- 
ing process. 

According to Dewey’s (1910) problem- 
solving model, evaluation plays its most 
important role at the last phase of the 
problem-solving process. In this sense eval- 
uation is used to mean the testing of the 
possible solutions produced by the problem 
solver before employing one of them. 

In the structure-of-intellect model, eval- 
uation is defined broadly as the process of 
decision as to whether any item of infor- 
mation that is cognized, remembered, or 
produced meets a certain standard or goal. 
It terms of the traditional problem-solv- 
ing models, the operation of cognition cor- 
responds to the phase commonly referred 
to as “understanding the problem." The 
operation of production, either divergent or 
convergent, corresponds to the phase com- 
monly described as "suggestion of possible 
solutions." The operation of memory is re- 
quired at all five stages of Dewey's model, 
since stored information has some bearing 
upon every mental event. 

The operation of evaluation should play 
an important role at the last phase of the 
problem-solving process if the possible solu- 
tions are at all doubtful or competitive. But 
evaluation may also be called for at any 
phase of the process, whenever there is un- 
certainty as to whether the information, 
either remembered, cognized, or produced, 
meets specified criteria. 

The problem of the operational specifi- 
cation for separating evaluative abilities 
from parallel cognitive abilities deserves 
a few comments. In the preceding section, 
evaluation was defined as the process of 
deciding whether information that is 
cognized, remembered, or produced meets 
certain standards or goals. A test de- 
signed to measure àn evaluative ability 
should pitch the evaluative aspect of the 
task at a level of difficulty that will en- 
sure that individual differences in test 
scores reflect that source. On the other 
hand, whatever cognitive, memory, or 
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production aspects the test may have 
should be made so easy that they con- 
tribute nothing to variance of the scores. 
Difficulty can be introduced into the de- 
cision process by providing an appreciable 
degree of uncertainty as to which of sev- 
eral alternatives is best. This can be 
achieved by making all of the alternative 
answers relevant, even acceptable under 
low standards of acceptability, and about 
equally desirable. Cognition can be kept 
at a low level of difficulty by specifying 
the problem and the criteria very clearly 
and by using very familiar information. 

In designing factor-analytic studies, a 
sufficient number of tests for each hypoth- 
esized factor should be present in the test 
battery, so that each factor axis may be 
overdetermined in rotations and each fac- 
tor clearly interpretable in terms of the 
apparent unique psychological function 
shared by all its tests and by no others. 
In this study, at least three tests were em- 
ployed for each of the 12 hypothesized 
evaluation factors. 

An important goal for the well-designed 
factor-analytic test of factor hypotheses is 
not only to determine what the experimen- 
tal constructs are but also what they are 
not. The formal hypothesis states that the 
12 factors of evaluation are not only dis- 
tinct from one another, but are also dis- 
tinet from other factors deduced from the 
model. For this reason, a number of 
marker tests, known from previous ex- 
perience to measure reference factors, were 
included in the analyses to demonstrate 
the distinctness of the new experimental 
factors from factors already known. 

The usual test of the distinctness of ex- 
perimental factors is made by selecting for 
simultaneous analysis reference factors 
that might possibly be identical with the 
experimental factors. Of all the noneval- 
uation factors, those of cognition were sus- 
pected of being most likely to be confused 
with the experimental evaluation factors. 
One reason is that it takes care to con- 
struct an evaluation test that does not 
offer necessary cognition problems of suffi- 
cient difficulty to introduce some cogni- 
tion variance into the total scores, or, in- 
deed, that does not become a cognition test 


instead of an evaluation test. For this 
reason, tests of parallel cognition factors 
were selected as the most important 
marker tests in the analyses. In addition, 
tests of memory, divergent production, 
and convergent production were included 
when it appeared important to do so. 


The Symbolic Study 


The whole logie of symbolie communi- 
eation as compared with conceptual or 
semantie communication is that more pre- 
cision can be had due to the denotative in- 
flexibility of symbols. One might then ask 
the question, ^What is there to evaluate in 
connection with symbolic information?” 
We might expect some different aspects to 
evaluation of symbolic information than 
those that apply to semantic information, 
which is relatively rich with connotative 
meaning. 

Several different varieties of informa- 
tion conform to the definition of symbols 
stated by Guilford and Hoepfner (1963): 
“Information in the form of signs, having 
no significance in and of themselves. [p. 
2]." The clearest example of a symbol is a 
number. Numbers have no significance in 
and of themselves, yet can be evaluated 
for numerical identity, order, or consist- 
ency, with respect to other numbers. Let- 
ters also conform to the definition when 
they are processed in terms of their literal 
properties rather than their figural proper- 
ties. Syllables can be symbolic units, as 
well as words, when their semantic mean- 
ings are not relevant to the task, as in 
breaking words into syllables or in word 
compounding. All these types of symbols 
were used in various experimental tests in 
this study. 

The existence of six distinct product 
factors of symbolie evaluation provides the 
major problem of this study. The demon- 
stration of these six factors, or the failure 
to do so, would confirm or fail to confirm 
the model from which the hypothesis was 
deduced. The tests designed as measures 
for each of the products will be described 
in detail later. 

With three kinds of symbols available 
and with the distinction between sensitiv- 
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ity and estimation tests, for a completely 
systematic experimental design it would 
have been desirable to have six experi- 
mental tests for every product factor. No 
effort was made to achieve fully this kind 
of coverage with experimental tests, and it 
was difficult to achieve all six kinds of tests 
with every product. There proved to be 
enough dispersion of the conditions to 
make possible answers as to whether both 
sensitivity tests and estimation tests serve 
to measure these evaluation factors and 
whether the kind of symbol makes a differ- 
ence in the success of tests. 

Symbolic reference factors. The marker 
tests for reference factors selected for 
analysis in the symbolic study are de- 
scribed below. More complete descriptions 
for all the tests employed in this analysis 
can be found in Hoepfner, Guilford, & 
Merrifield (1964). 

CSU—Cognition of symbolic units 
(symbol cognition): 

Disemvowelled Words—Recognize fa- 
miliar words with dashes in place of 
vowels; then complete the words by 
writing the vowels. 

Word Combinations—Produce a new 
word from the ending of one word and 
the beginning of another. 
CSC—Cognition of symbolic classes: 

Number Classification—Select one of 
five alternative numbers to fit into each 
of four classes of three given numbers 
each. 

Number-Group Naming—State how 
the numbers in each set of three are 
alike. 

CSR—Cognition of symbolic relations: 

Seeing Trends IL—Describe a trend 
based upon relations of letters in a 
group of words. 

Word Relations—Recognize the same 
relation between words in each of two 
pairs, then complete a third pair from 
five alternative words using the same 
relation. 

CSS— Cognition of symbolic systems: _ 

Cirele Reasoning—Discover the prin- 
ciple by which one circle is blackened in 
each of four rows of circles and dashes. 
Apply the rule to the fifth row. 

Letter Triangle—Find the pattern of 


the letters arranged systematically 

within a triangle. 

CSI—Cognition of symbolic implica- 
tions: 

Symbol Grouping—Rearrange scram- 
bled symbols in a specified systematic 
order as efficiently as possible. 

Word Patterns—Arrange a list of 
short words efficiently in a crossword- 
puzzle design. 

CMU—Cognition of semantic units (ver- 
bal comprehension). 

Iowa Tests of Educational Develop- 
ment—Test 8, General Vocabulary— 
Recognize the meanings of words com- 
monly used in communication. This test 
is similar to standard verbal comprehen- 
sion (CMU) marker tests. 

Preliminary Scholastic Aptitude Test 
—Verbal—PSAT is an abbreviated form 
of the SAT. The verbal score is the sum 
of scores on four tests: Opposites, Sen- 
tence Completion, Analogies, and Read- 
ing Comprehension. The dominant sat- 
uration is hypothesized to be CMU with 
some CMR variance contributed by the 
Analogies test. 

Cooperative School and College Abil- 
ity Test—Verbal—Verbal score is com- 
posed of scores on two tests, Sentence 
Understanding and Word Meanings. 
Tests similar to each have previously 
loaded on the verbal-comprehension 
(CMU) factor. 

MSI—Memory for symbolic implica- 
tions (numerical facility) : 

Numerical Operations—Rapidly add, 
subtract, or multiply simple numerical 
problems and select one of six alterna- 
tives as the answer. 

DSC—Divergent production of symbolic 
classes: 

Number Grouping—Group given 
numbers into several different classes 
based upon properties they have in 
common. 

Varied Symbols—Find the different 
common properties that sets of letter 
combinations may have in common. 
NSS—Convergent production of sym- 

bolic systems: 

Operations Sequence—Produce the 
correct order of three specified numeri- 
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cal operations in order to get from one 

given number to another. 

Word Changes—Arrange a list of 
words, each containing the same number 
of letters, so that the firs& word is 
changed into the last word with only 
one letter change at each step. 
NST—Convergent production of sym- 

bolie transformations: 

Camouflaged Words—Find within a 
meaningful sentence a group of consecu- 
tive letters that spell the name of a 
sport or game. 

Word Transformation—Separate let- 
ters of words in a phrase with vertical 
lines to make a different set of words. 
NSI—Convergent production of sym- 

bolic implications: 

Form Reasoning—From the table, 
find the form that is implied by the 
three given forms. 

Sign Changes—Solve simple arithme- 
tic problems in which the operation sign 
is changed according to a set of rules. 
EFU—Evaluation of figural units (per- 

ceptual speed) : 

Identical Forms—Find one of five 
figures that is exactly the same as the 
given figure. 

Perceptual Speed—Rapidly match 
each of five objects to one of four given 
objects. 

Finger Speed: 

Marking Speed Test—Make as many 
Xs as possible in the rows of squares 
provided. 

Symbolic evaluation factors. The tests for 
the hypothesized factors were developed 
using either of two approaches or a com- 
bination of the two. In the first approach, 
specific examples of tasks are deduced 
from the operation-content-product com- 
bination being investigated. For exam- 
ple, the ability in the cell EST, evaluation 
of symbolic transformations, involves eval- 
uation of changes in symbolic materials. 
A code can be an example of a symbolic 
change, and so the test, Decoding, was de- 
veloped. 

_ The second approach emphasizes tasks 
similar to those for established factors 
having one or two attributes in common 
with the new factors. For example, ESU, 


evaluation of symbolic units, and EFU, 
evaluation of figural units, differ only with 
respect to the content category; test for- 
mats might be very similar. 

Nearly 30 different pretest booklets were 
administered to classes in psychology at 
several colleges in the Los Angeles area. 
These pretestings were designed to obtain 
technical information such as the appro- 
priate level of item difficulties, compre- 
hension level of the test instructions, test 
reliabilities, and optimal time require- 
ments for newly developed tests. Exten- 
sive item analyses were conducted when- 
ever pretesting information revealed low 
reliability estimates. 

From the reliability and intercorrela- 
tion data obtained from pretesting, 25 tests 
were selected from a pool of over 40 tests 
especially designed or adapted to measure 
the six experimental factors. The selected 
tests had pretest reliabilities in the .70's 
and .80's. Within-factor intercorrelations 
were generally considerably higher than 
between-factor intercorrelations, further 
ensuring the demonstration of the dis- 
tinctness of the expected factors. The cri- 
teria of high reliability and desirable cor- 
relational pattern determined which tests 
were finally selected to represent the ex- 
perimental factors in the final analysis. In 
the following paragraphs, the six experi- 
mental factors and the tests selected to 
measure them are discussed in detail. 

ESU—Evaluation of symbolic units. 
The five experimental tests developed to 
measure evaluation of units had in com- 
mon symbolic stimuli that are processed as 
wholes, rather than separated, analyzed, or 
classed. Although similar symbolic stimuli 
are employed in tests of the other intellec- 
tual products, the mental process per- 
formed upon units must maintain the thing 
quality of the stimuli. Guilford and Hoepf- 
ner (1963) had suggested tests employing 
letters and digits as stimuli for measures 
of ESU based on tenuous prior studies 
(French, 1951). Construction of the experi- 
mental ESU tests was based upon the his- 
tory of parallel tests of symbolic units and 
the tryout of new kinds of stimuli. 

The test, Correct Spelling, employed 
complete, common English words as sym- 
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bolie stimuli. The words function as sym- 
bols because E is to direct evaluation 
toward spelling rather than meaning. The 
E is tested on his sensitivity to the cor- 
rectness or incorrectness of the spelled 
symbolic unit. In this case, sensitivity to 
speling is based largely upon the long- 
term retention of the correct symbolic ele- 
ments of standard English words. The 
words employed as items were selected 
from lists of commonly misspelled words 
published in English handbooks and secre- 
tarial manuals. 

Derived from the format of a test used 
by Thurstone (1938a), Derivations also 
employs complete English words as test 
stimuli. Whereas Thurstone’s test has Hs 
make as many short words as they can in a 
limited time from the letters in a large 
given word, Derivations supplies not only 
the given word, but also 50 short words 
possibly derived from it. The E's judg- 
ments are based upon sensitivity to the er- 
rors in some words that could not be de- 
rived from the long given words. 

Familiar Letter Combinations is an ex- 
perimental test that has a completely new 
type of symbolic stimuli: three-letter syl- 
lables. The E is to estimate which of two 
given syllables is more common as a part 
of real English words. Familiarity is the 
eriterion for decision. Neither the syllables 
nor the criteria of real words are to be 
considered semantically; only the relative 
frequencies of occurrence are relevant. 
The key for this test was determined 
from the empirical frequency counts re- 
ported by Underwood and Schulz (1960). 
The nonsense syllables are paired so that 
the keyed syllable is far more commonly 
used than its alternative. 

Letter “U” is a test of E's sensitivity to 
the presence of a specified letter in words 
under speeded conditions. It is based upon 
Thurstone’s test Letter “A” (1938b), which 
split its variance among factors that 
Thurstone called perceptual, number, and 
word factors. Bechtoldt (1947) found Let- 
ter “A” to be loaded highly on a factor 
with a test of crossing out specific letters 
on a page of regularly spaced letters. Al- 
though Cattell names the factor on which 
this test is loaded “speed of symbol dis- 


crimination” (Cattell, 1953), and Guilford 
and Hoepfner (1963) suggest the factor is 
ESU, French, Ekstrom, and Price (1963) 
conclude that our knowledge concerning 
this factor is not at all clear since several 
“subfactors” tend to pull together in dif- 
ferent ways, depending upon the tests in- 
cluded in the factor-analytie battery. In 
general, a test like Letter “U” is often 
found in strong relation to perceptual- 
speed tests. To clarify this ambiguity, not 
only were four tests designed to measure 
ESU ineluded in the battery along with 
Letter *U," but also two strong perceptual- 
speed (EFU) tests. 

The test Symbol Identities was designed 
as a measure of E's sensitivity to the iden- 
tity or nonidentity of paired sets of num- 
bers, letters, and words, under speeded 
conditions. It is essentially parallel to 
tests of EFU, in which identity of pairs 
of figures is in question. Symbol Identities 
is the only ESU test employed that euts 
across all the possible stimuli considered 
appropriate in the symbolic domain. 

Symbol Identities is similar to many of 
the tests designed to measure clerical 
speed and accuracy; E decides whether or 
not the two members of pairs of symbol 
sets are the same or different. This test, 
like Letter ^U," could conceivably share 
much figural variance, as Es could com- 
pare each symbol stimulus, figure by fig- 
ure, and arrive at an accurate judgment. 
Such activity is very inefficient, however; 
a figural attack upon Symbol Identities 
should result in poor performance, unless it 
is used only when a quick symbolic at- 
tack does not yield a decisive choice. 

ESC—Evaluation of symbolic classes: 
Four experimental tests were developed to 
measure the factor ESC. A symbolic class 
was defined for this investigation as a 
group of symbols with some common prop- 
erty. Such a group of symbols would be 
composed of at least two members whose 
common property must be symbolic, not 
figural or easily semanticized. 

In Best Number Class, E's task is to as- 
sign given numbers to one of four classes 
in such a way as to maximize each num- 
ber's value by assigning it to the most ex- 
clusive class it fits. The four classes into 
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which the stimuli were to be assigned 
were, in order of exelusiveness: EVEN 
MULTIPLES, ODD MULTIPLES, SQUARES, and 
primes, The Hs were warned that the 
numbers could possibly be assigned to sev- 
eral classes and that credit could only be 
earned by assigning each number to its 
most exclusive class. 

The test Best Number Pairs is the other 
hypothesized ESC measure employing 
numbers as stimuli. The E's task is to 
choose one of three pairs of numbers that 
makes the best class. In order from best 
to poorest, the classes are: pairs of per- 
fect, squares, pairs of multiples of the 
same number, pairs of odd or even num- 
bers, and pairs with no class property. 

Sound Grouping is a test with a long 
history. In each item, four words are 
given, three of which are fairly good 
rhymes and one is not. The latter is to be 
noted and selected, for the right answer. 
The test’s factorial composition has been 
open to considerable question because of 
its tendency to go with different factors, 
depending upon the battery in which it 
has been analyzed. 

Because of this history of factor insta- 
bility and the fact that the previous stud- 
ies did not include tests of what would now 
be called symbolic classes, Sound Grouping 
was hypothesized to measure ESC, a seem- 
ingly logical place for it. It was not ex- 
pected, however, that Sound Grouping 
would suddenly become a unifactor test 
when placed in a battery with several ESC 
tests. 

The fourth test designed to measure 
ESC is Word Choice. The E is to choose 
the best of three possible additions to a 
class of three words. The class properties 
used in Word Choice are symbolie, for ex- 
ample, order or nearness of certain let- 
ters or types of letters in the words. This 
test differed from other ESC tests in that 
none of the alternative words for any class 
completely possessed all the class proper- 
ties; a best word had to be chosen, even 
though it was slightly wrong. It is thus an 
estimation test. 

ESR—Evaluation of symbolic relations. 
The four tests developed to measure the 


factor ESR employed recognized connec- 
tions, based upon symbolic variables, be- 
tween symbolie units. Examples of con- 
nections based upon symbolic variables are 
“greater than," “equal number of con- 
sonants,” and “similar ratios." 

The first experimental ESR test, Re- 
lated Words I, was adapted by analogy to 
Matched Verbal Relations, designed for 
factor EMR, evaluation of semantic rela- 
tions. In Related Words I, E estimates 
which of three alternative word pairs is 
most similar to a given related pair. The 
relation between members of any pair is 
based upon the order and position of let- 
ters and the vowels and consonants that 
are changed or moved. This is the only 
ESR test in which no alternative answer 
is completely correct; only a best alterna- 
tive is to be selected. 

Sign Changes II had been developed as 
an ESR test to be used in a predictive 
battery for success in ninth-grade mathe- 
maties courses (Guilford, Hoepfner, & 
Petersen, 1965). The task in this test is to 
determine what sign changes, if any, must 
be made to change a numerical expression 
into an equation. An elementary under- 
standing of arithmetieal operations and 
the relationships of equality and inequal- 
ity of expressions is all E needs in order 
to understand clearly the test items and 
the task. 

Similar Pairs is a new test, in both idea 
and items. The stimuli are word pairs, the 
members of which are related by letter lo- 
cations and letter changes. The E's task is 
to judge whether the members in two such 
pairs are or are not similarly related. The 
process involved in responding to this 
test is sensitivity to sameness or different- 
ness of the relations within the word pairs. 
In all the items, the relations within the 
pairs were kept extremely simple, so that 
there would be little or no difficulty in cog- 
nizing the relationships, so that cognition 
variance would be minimized in the test 
scores and the relative importance of the 
evaluation variance would be maximized. 

Symbol Manipulation is a test of the 
ability to decide whether a given relation- 
ship between two letters follows logically 
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from other statements of relationship in- 
volving the same letters, where the rela- 
tionships are “greater than," “equal to," 
and "less than," and their negations, all 
statements in symbolic form. 

ESS—Evaluation of symbolie systems: 
Like the tests hypothesized to measure 
ESU, tests for ESS seemed to be easy to 
construct. Almost one dozen tests were de- 
veloped to measure ESS and were pre- 
tested. Most of the tests at this experi- 
mental stage proved to have reasonably 
good reliability and reasonable intrafac- 
tor correlations. The five tests chosen to 
define the systems factor in the final anal- 
ysis broadly cover the various types of 
symbolie content and sensitivity versus 
estimation. 

All the stimuli for tests of symbolic 
systems are organized aggregates of units 
or relations wherein the interrelated or in- 
teracting parts are symbolically defined 
within the aggregate. The system, then, is 
the organization or pattern of parts, which 
may be compared with another system as 
to identity or similarity or which may be 
evaluated for internal consistency. 

Best Letter Set was designed as a meas- 
ure of E's ability to estimate which of 
three sets of three or four letters each is 
most like a given set. The criterion of 
similarity is based upon the order and 
kinds of letters within the set. Although 
such small sets of letters might appear to 
function as units, the systems qualities of 
the alternative sets were sufficiently simi- 
lar to force Æ to focus on them. It seemed 
highly unlikely that even the most sophis- 
ticated E could treat the stimuli as units 
and obtain a high score on this test. 

Both Correct Letter Orders and Cor- 
rect Number Series are tests of H’s sensi- 
tivity to internal inconsistencies in sym- 
bolic systems. The stimuli in both tests 
are sequences of symbols organized accord- 
ing to some simple principle, similar to 
items in familiar number-series tests. The 
systematic principle is stated verbally and 
E is to judge whether or not the sequence 
follows that principle. 

The test Series Relations might also 
be considered an evaluative form of a 


number-series test, even though the task 
appears to be quite unlike that for Correct 
Number Series. In Series Relations, Æ is 
given a series of three numbers and is 
told that each element of the series except 
the first one is determined from the pre- 
vious element (one to the left) according 
to some unknown rule. The E is then to 
estimate which of three alternative rules or 
operations would best relate each series 
element to the previous one. Although E 
might simply try each rule upon the first 
and second series elements, obtain a three- 
number series, and compare it to the given 
one, selecting the correct rule, he is forced 
into making a choice or judgment because 
none of the three alternative rules is fully 
correct. That is, no one rule will cor- 
rectly reconstruct the series from the first 
element; but one will come nearest. 

In the test, Way-Out Numbers, E is 
presented with a list of four ordered num- 
bers and is instructed to choose either the 
first or last one on the basis of its being 
farther away from the remaining three 
numbers. In other words, E is to arrange 
the numbers on the dimension of numeri- 
eal value and is to choose that extreme 
number whose value is farther from the 
other numbers' values. 

EST—Evaluation of symbolie transfor- 
mations. Three experimental tests were se- 
lected to measure the factor EST. The ex- 
treme difficulty of constructing reliable 
evaluative tests of symbolic transforma- 
tions limited the choice of tests. The con- 
tent of the three tests was concerned with 
changes from one form of symbol to another 
equivalent form, or changes in symbolic 
units to meet certain requirements. The 
transformations tests developed for this 
study used letters and words as stimuli. It 
appeared, during test construction, that 
numerical stimuli were not readily suscep- 
tible to transformations without the in- 
volvement of other products, such as rela- 
tions or systems. The denotative inflexibility 
of numbers did not allow for equivalent 
forms of the same numerical value using two 
different symbols. This limitation had ap- 
plied also in the study isolating the only 
other known symbolic-transformation fac- 
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tor, NST (Guilford, Merrifield, Christensen, 
& Friek, 1961). The tests loading on the 
NST factor all involved words as the stim- 
uli. 

A rather common transformation of 
words and letters is any code that allows 
their encoding. In almost all cases of sym- 
bol coding, the transformation of one set 
of symbols to its encoded set of symbols 
is a one-to-one mapping of the symbol 
set onto the code set. Such a one-to-one 
mapping is suitable for a test of sensitivity 
to errors in coding only when the test is 
speeded and the coding system is well 
known by the Es. This implies that the sen- 
sitivity to slight, but possibly important, 
miscodings is an evaluative process for the 
individual who functions well (is experi- 
enced) in the coding process. 

Because no coding system is known 
with great generality within the popula- 
tion, and because it is inefficient and self- 
defeating to teach Es a complete coding 
system (memory factors might predomi- 
nate), the EST test, Decoding, employs a 
simple and ambiguous code. Simplicity 
and ambiguity were introduced into the 
coding system for Decoding by employing 
a code for letters which does not map one- 
to-one onto the alphabet. The ambiguity 
of the code allows for words to be judged 
according to their ease of encoding or de- 
coding. The change from an unambiguous 
code to an ambiguous one also changes the 
type of evaluation test involved. Whereas 
an unambiguous code and experience call 
for sensitivity, an ambiguous code calls for 
estimation; the code provides incomplete 
information, and E must estimate the com- 
plete information. 

In the test, Decoding, E is presented 
with two words and is asked to choose 
which one, if coded, would be easier to 
decode unambiguously. E is also given the 
opportunity to judge both words as equal 
in difficulty of decoding. 

Jumbled Words is the only test designed 
for EST that is in the sensitivity cate- 
gory. The E is given a stimulus word con- 
taining between five and seven letters and 
is to judge whether or not each of five al- 
ternative words is an accurate anagram- 


matie derivation from the given word. 
Jumbled Words is, therefore, similar in 
stimulus material to the ESU test, Deriva- 
tions, which also uses anagram-type stim- 


uli. 

'The third test designed for EST, Typing 
Errors, is similar to Decoding in the task 
involved and the stimuli used. The E is 
given an incorrectly typed word and is to 
choose from among alternatives the word 
that the ineorrectly typed word would most 
likely be. The judgments are made on the 
basis of common typing errors due to the 
arrangement of the typewriter keyboard. 
A keyboard diagram is printed on each 
test page for E's reference. 

The estimation process involved in re- 
sponding to Typing Errors is probably not 
dependent upon EST ability alone, how- 
ever. It would seem that some figural abil- 
ity would be involved in this test due to 
the spatial nature of the keyboard ar- 
rangement, Further, it might be expected 
that typing experience might enter into 
proficiency at the required task. However, 
the correlation of scores on Typing Errors 
and amount of typing experience was re- 
ported to be .03 (Hoepfner et al., 1964). 

ESI—Evaluation of symbolic implica- 
tions. Tests designed to measure the fac- 
tor ESI employed all types of symbolic 
stimuli. For the evaluation process, impli- 
cations are defined as the expectancies or 
probable relative values of the presented 
symbols (estimation), or possible symbolic 
interpretations of a unit or system (sen- 
sitivity to symbolic problems). 

The test, Abbreviations, presented E 
with a shortened spelling of a common 
word, E to choose one of the three alter- 
native words that the abbreviated word 
most likely implies. The meanings of the 
words are irrelevant to choosing an alter- 
native, and the spelling of the alternatives 
is correct. The only task for E is to choose 
the most expected value for the abbrevia- 
tion, a task of estimating. No observance 
of shorthand principles was exercised in 
test construction; the abbreviations were 
short and relatively unambiguous. Usu- 
ally, but not always, this implied dropping 
vowels and unsounded consonants from the 
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keyed alternative. The E was warned, 
however, that sounding-out the abbrevia- 
tion would not necessarily aid him in his 
choice. Hoepfner et al. (1964) found the 
correlation between scores on Abbrevia- 
tions and experience with shorthand to be 
—.07. 

Letter Problems is similar in format to 
Form Reasoning, a test of NSI. The eval- 
uation form uses letters as the stimuli and 
asks E not to solve the equation, but to 
judge the difficulty or possibility of solv- 
ing it, on the basis of provided rules. It 
was hypothesized that E would have to 
make his judgments based on foresight. 
The H’s judgments were of the three- 
category type; problems were easy to solve 
(straightforward), difficult to solve (in- 
volving manipulations), or impossible to 
solve due to inadequacies of the table of 
substitutions. 

The third ESI test is named S Test. In 
this test, E is given a stimulus about which 
he is to find a problem to solve. The solu- 
tion indieates the nature of the problem to 
whieh E was sensitive. The test might 
therefore measure E's sensitivity to sym- 
bolic implications of unstructured prob- 
lems. It should be noted, however, that 
this test is not congruent with the concep- 
tion that evaluation is “sensitivity to er- 
ror,” for no error is judged. It is a test of 
E's sensitivity to implications (as, indeed, 
it turned out) rather than a sensitivity to 
errors in implications. 

Symbol Reasoning involves operations 
similar to the test, Logical Reasoning. 
The E is given two premises in the form 
of an equation involving inequalities such 
as x < y = 3z, and is asked to judge 
whether each of three conclusions (such as 
X = 3z) is true, false, or uncertain, on the 
basis of the given equation. The equation 
is a symbolic statement of the relation- 
Ships between pairs of three unknowns, X, 
y, and z, in order. Each of the three con- 
clusions to be evaluated involves one of the 
three possible pairings of unknowns. 

Although it seemed reasonable to assume 
that conclusions involving adjacent un- 
knowns, x and y, and y and z, might be 
relational, and conclusions involving the 


two extreme terms, x and z, would be more 
clearly implicational, pretesting analysis 
showed that the separately scored kinds of 
conclusions intercorrelated highly. For this 
reason, and for the reason that the numer- 
ieal coefficients of the unknowns were not 
the same in the premise and the conclu- 
sions, i& was decided that Symbol Reason- 
ing should be a measure of ESI. 


The Semantic Study 


When a combination of letters is not 
only recognized as a group of symbolic 
entities, but also conveys meaning in the 
form of words, that meaning is semantic 
information. Semantic information consti- 
tutes the major content of verbal thinking. 
It can be transmitted through a number of 
media including words and sentences as 
well as pictures that imply verbal conno- 
tations. 

Because of the richness of connotative 
meaning of words and the ambiguities in- 
herent in our language structure, the con- 
cept of semantic evaluation may be com- 
prehended easily at the conceptual level. 
However, in terms of test development, 
the existence of rich connotative mean- 
ings poses a difficult problem in insulating 
the processes of semantic evaluation from 
those of semantic cognition. Knowing or 
understanding the meaning of a word is a 
matter of semantic cognition. As it was 
stated previously, a test designed to meas- 
ure an evaluative ability should pitch the 
evaluative aspect of the task at a level of 
difficulty that will ensure that individual 
differences in test scores reflect that source. 
On the other hand, the cognitive aspects of 
the task should be made so easy that they 
contribute a minimum to variance of the 
scores. If this approach is correct, we 
should be able to develop a test of evalua- 
tion from a test of cognition by emphasiz- 
ing tasks that demand the evaluative 
operation and a minimum of cognitive op- 
eration. In fact, many tests in the cate- 
gory of semantic cognition, where all six 
factors have already been demonstrated, 
were used in this study as models for the 
development of tests of semantic evalua- 
tion. Such a procedure, if successful, en- 
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sures that the new factors are not merely 
the results of the difference in test for- 
mats. 

Semantic reference factors. A major con- 
cern in this study is the separation of 
evaluative abilities from cognitive abili- 
ties. Six parallel semantie factors in the 
area of cognition were therefore included 
as reference factors. The reason for this 
concern regarding the separation of cogni- 
tive from parallel evaluative factors is 
that so many of the evaluation tests were 
construeted by analogy to cognition tests, 
and many of them resemble cognition 
tests except for rather subtle differences in 
emphasis. Even with the best of inten- 
tions, we could not expect that all cogni- 
tive variance would be eliminated from 
all such evaluation tests. The reference 
factors of semantic cognition and produc- 
tion are defined below, with mention of the 
tests used to represent them in this study. 
CMU—Cognition of semantic units (ver- 
bal comprehension) : 

California Achievement Test—Read- 
ing Voeabulary—This vocabulary test is 
composed of 180 words in the four prin- 
cipal areas of the school curriculum— 
Mathematics, Science, Social Science, and 
General. 

Verbal Comprehension—Select from 
alternatives the word that is similar in 
meaning to a given word. 

Word Completion—Write the defini- 
tions or synonyms of given words. 
CMC—Cognition of semantic classes: 

Verbal Classifieation—Assign given 
words to one of two classes or to neither, 
each class defined by four other given 
words, 

CMR—Cognition of semantic relations: 

Verbal Analogies I—Discover the rela- 
tion between two words and select the 
word that completes an analogy, the se- 
lection, as such, being quite easy. 
CMS—Cognition of semantic systems 

(general reasoning) : 

California Achievement Test—Arith- 
metie Reasoning—This test consists of 
four sections—Number Concepts, Sym- 
bols and Rules, Numbers and Equations, 
and Problems. 


Ship Destination Test—Find the dis- 
tance from a ship to given points, con- 
sidering the influences of several varia- 
bles. 

CMT—Cognition of semantic transfor- 
mations (penetration): 

Similarities—Write six ways in which 
common objects of a pair are alike. 
CMI—Cognition of semantic implica- 

tions (conceptual foresight) : 

Pertinent Questions—Write four ques- 
tions, the answers to which would serve 
as a basis for making a decision in a 
conflict situation. 

NMT—Convergent production of se- 
mantic transformations (semantic redefini- 
tion): 

Pieture Gestalt—Indieate which ob- 
ject in a photograph will serve a speci- 
fied unconventional or uncommon pur- 
pose. 

DMI—Divergent production of semantic 
implications (semantic elaboration) : 

Possible Jobs—Write as many as six 
different jobs which might be indicated 
by a pictured emblem. 

Semantic evaluation factors. Of the five 
previously known factors mentioned in 
the introduction, four had been tentatively 
identified with semantic-evaluation cells 
of the structure-of-intellect model—‘logi- 
cal evaluation” with EMR, “experiential 
evaluation” with EMS, “judgment” with 
EMT, and “sensitivity to problems” with 
EMI. The coincidence of these four fac- 
tors with structure-of-intellect abilities, 
however, was by no means regarded as 
firmly established. Consideration of rela- 
tions between the four factors and the 
model led to the decision to develop new 
tests for all six of the hypothesized abili- 
ties. Some tests representing the previous 
factors or modifications of them were in- 
cluded in the new test battery in order to 
provide continuity with previous work. 

Twenty-two tests were developed or ex- 
tensively revised for this study. Eight pre- 
test booklets of preliminary forms were 
administered to a number of classes at 
the University of Southern California. Two 
pilot studies were also conducted using 
high-school students who were expected to 


INTELLECTUAL ÅBILITIES 15 


be similar to those to whom the final bat- 
tery was to be administered. These pre- 
testings were designed to obtain various 
information concerning the tests and the 
examinee reactions to them. Information 
was obtained on such matters as the ap- 
propriate level of item difficulties, E's com- 
prehension of test instructions, test relia- 
bilities, optimal time requirements for 
newly developed tests, and some prelim- 
inary intercorrelations. Extensive item 
analysis was conducted whenever the pre- 
testing information indicated low reliabili- 
ties. 

The hypothesized factors are defined 
below and the names of the tests used to 
represent them in the new battery are 
given. Further information regarding the 
tests, with sample items, may be found in 
Nihira, Guilford, Hoepfner, & Merrifield 
(1964). 

EMU—Evaluation of semantic units: 
Units are relatively segregated or circum- 
scribed items of information. A semantic 
unit may be a word, an object, an idea, or 
a verbalized concept, depending upon the 
nature of the information involved. In 
this study, the EMU factor is hypothe- 
sized to be the ability to evaluate the 
suitability or adequacy of a word or an 
object in terms of given criteria. Three 
tests were developed as measures of EMU. 

In the test, Double Descriptions, E is to 
evaluate objects according to how they 
meet two stated criteria in the form of at- 
tributes. In each item, four alternative ob- 
jects are to be judged, with the keyed ob- 
ject best. 

The task in Synonyms is to evaluate the 
identity or degree of similarity of the 
meanings of words. This test is like most 
multiple-choice verbal-comprehension tests, 
except that all alternatives are synonyms 
of the given word. A choice must be made 
on the basis of subtle differences in mean- 
ing, 

Word Substitution, the third test devel- 
oped for EMU, asks E to evaluate a group 
of words in terms of their relative suitabil- 
ity in a given sentence. As in Synonyms, 
the decision regarding substitution of a 
word in a sentence is to be made among 


fine shadings of meaning, since any al- 
ternative could conceivably be chosen and 
would fit acceptably into the sentence. 

EMC—Evaluation of semantic classes: 
Classes are recognized sets of units grouped 
by virtue of their common properties, In 
this study, the EMC factor is hypothesized 
as the ability to evaluate suitability or 
adequacy of a class grouping to represent 
a given word, object, or a set of objects. 

From given alternatives in the test, Best 
Word Class, E is to choose the class name 
that best represents a given word or object. 
The alternative class names are all cor- 
rect; choices are to be made on the basis of 
the criterion of how well the name covers 
the class properties of the object. 

In the test, Best Word Pairs, E is to 
choose the pair that makes the best class 
from given pairs of words. The choice 
among word pairs, all pairs having proper- 
ties in common, is to be made on the basis 
of the number and importance of shared 
properties. 

The task in Class Name Selection is to 
choose the elass name that best represents 
a set of words or objects from given alter- 
natives. The alternative class names are all 
correct; choices are to be made in terms of 
aptness of the name, that is, which one de- 
scribes the class most exactly. 

EMR—Evaluation of semantic rela- 
tions: Relations are recognized connections 
between units of information based upon 
variables that apply to them. In this study, 
the EMR factor is hypothesized to be the 
ability to evaluate relations between words 
or ideas. The factor called “logical evalua- 
tion,” often identified in previous studies, 
was eventually assigned to the cell for 
EMR in the structure-of-intellect model. 
Therefore, Logical Reasoning, one of the 
tests that has consistently identified the 
logical-evaluation factor, was included in 
the present study to provide continuity 
with previous work. 

In the test, Logical Reasoning, E is to 
choose the correct conclusion that can be 
drawn from two given premises. Only one 
of the alternatives in this syllogistic test is 
correct in each item. The incorrect ones 
are not obvious, however, since they repre- 
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sent common errors made in syllogistie 
reasoning. 

In Best Trend Name, E is to select the 
word that best deseribes the order of four 
given words. All three alternative trend 
names refer to trends that at least partially 
describe the four words, but one describes 
a trend with greatest justification. This 
test is parallel to Seeing Trends, in which 
E names each trend, a test that probably 
measures both CMR and NMU. 

The task in Matched Verbal Relations is 
to select the pair of words that represents 
the relationship most similar to the rela- 
tionship given in the model pair of words. 
The difficulty is in the choice among alter- 
natives, and not in the discovery of the 
relationship between the model word pair. 
The alternatives all have some plausibil- 
ity in that they are related in some way 
with one another and with the model pair 
of words, but one pair exhibits a relation- 
ship most like that in the model word 
pair. 

The fourth test, Verbal Analogies III, 
asks Æ to choose the alternative that is the 
best completion of the analogy, the relation 
between the first two words being fairly 
obvious. The keyed alternative has a 
greater similarity of relation to the com- 
pleted analogy than do the other alterna- 
tives. 

The £ is to choose a word that is similar 
in meaning to each of two other given 
words in the test, Word Linkage. The word 
chosen must be related to the given words 
in two different ways. Only one alternative 
clearly conforms to both criteria; the re- 
maining alternatives do not conform to one 
of them. 

EMS—Evaluation of semantic systems: 
Systems are organized or structured aggre- 
gates of information, or complexes of inter- 
related parts. Present knowledge concern- 
ing the CMS and NMS factors suggests the 
diversity of characteristics of semantic 
systems. It seems that the semantic system 
can be a sentence, a complex of relation- 
ships among words, a problem, a sequence 
of events, or a common situation. For this 
reason, the tests developed for the EMS 
factor sampled a variety of problem items 


that could be expected to involve semantic 
systems. 

Complete Thoughts is a new test in 
which E is to decide whether or not a 
given sentence expresses a complete 
thought. Sentences of the kind that char- 
acteristically confuse students as to 
whether they express complete thoughts 
were selected for this test. Completeness is 
the criterion. 

From given alternatives in the test, Im- 
portant Facts, E is to select the most im- 
portant and the least important facts 
needed to solve a problem. All the alterna- 
tive facts could possibly play a part in 
solving the problem, but one is most im- 
portant under the given circumstances and 
one is least important. 

Sentensense is like Complete Thoughts 
in that both present E with sentences. In 
Sentensense, Æ is to evaluate the internal 
consistency of the ideas or events expressed 
in each sentence. On the surface all sen- 
tences may appear to be meaningfully con- 
sistent, but some are not. This test was 
conceived as a verbal counterpart to the 
next one, Unlikely Things, which presents 
internal inconsistencies in pictorial form. 

The task in Unlikely Things is to select 
from four given alternatives the two more 
unlikely things in sketches of a common 
situation. Judgments of unlikeliness in this 
test must be made on the basis of apparent 
violation of physical or conventional prin- 
ciples of varying degrees of possibility or 
on the basis of internal consistency. This 
test is a multiple-choice form of the test, 
Unusual Details, which had strongly 
helped to determine the factor of "experi- 
ential evaluation," later identified with 
the factor EMS. 

Word Systems is a new test parallel to 
systems tests of figural matrices. E is to 
evaluate the internal consistency of a 
matrix of words arranged in terms of 
three meaningful rows and columns. None 
of the three word-matrix alternatives is 
completely consistent, but one is most con- 
sistent and one is least consistent. 

EMT—Evaluation of semantic trans- 
formations: A transformation is defined as 
a change. In the semantic area, this usu- 


INTELLECTUAL ABILITIES 17 


ally means a change of interpretation or 
use of various objects, ideas, concepts, and 
other verbal-meaningful materials. In this 
study, EMT has been hypothesized as the 
ability to evaluate changes of interpreta- 
tion of various objects and stories. The 
following tests have been adapted from 
tests for the NMT and DMT factors, em- 
phasizing their evaluative aspects. 

Product Choice is a test similar to the 
test, Object Synthesis. The E is to select an 
object that can be made most adequately 
for a specified purpose by combining two 
given objects. The Æ must evaluate the 
adequacy of the unconventional uses of the 
common objects to choose the best answer. 

In the test, Story Titles, E is to choose 
the best title that gives a new interpreta- 
tion for a short story. One alternative title 
is always best on the basis of relevance to 
the story and provision of a new view or 
interpretation of the story. 

From a set of alternatives, E is to select 
an object that can be used most ade- 
quately for a specified, unusual purpose in 
the test, Useful Changes. The E must judge 
whieh object, used unconventionally, 
would most adequately perform the given 
task. All the alternatives could be used to 
perform the task, but one is better on the 
basis of practicality and efficiency. 

EMI—Evaluation of semantic implica- 
tions. The factor called “sensitivity to 
problems,” discussed in the introduction, 
had been assigned to the cell for EMI. For 
this reason, two of the tests used to measure 
the factor (Apparatus Test and Seeing 
Problems) were included in this battery as 
potential measures of EMI, in addition to 
the new tests developed parallel to tests 
of semantic implications in the areas of 
cognition and divergent production. 

Apparatus Test asks E to suggest two 
improvements for each of a number of com- 
mon appliances. Responses are scored on 
the basis of whether senses the need for 
realistic and desirable improvements in 
the objects. 

The next two tests were included as 
marker tests for the factor previously 
known as “judgment,” which they had 
helped to define (Berger et al., 1957). Al- 


though “judgment” had been tentatively 
identified with factor EMT, the tests do 
not resemble very much the new tests de- 
veloped for that factor. They were given a 
place among the EMI tests in this study, 
without serious expectation that they 
would cohere with that group in the new 
analysis. 

In Commonsense Judgment I, E is to se- 
leet the two best reasons why a proposed 
plan is faulty among the five given alter- 
natives. All the reasons were designed to 
appear reasonable, but two are either more 
important or seem to be more apt. In the 
test, Commonsense Judgment II, E is to 
select the best method of demonstrating the 
iruth of a given statement. All the alterna- 
tive methods would demonstrate the state- 
ment’s truth, but with varying degrees of 
success and physical possibility. The best 
method is successful, efficient, and possible 
to carry out. 

The task in Seeing Problems is to list 
problems that might arise in connection 
with common objects. This test was 
planned to measure H’s sensitivity to con- 
sequences and other implications of the 
use of objects. 

Sentence Selection is a new EMI test de- 
signed to measure H’s ability to evaluate 
extrapolations. The E is to select the 
statement that is most probably true, in 
view of given information. Choice of alter- 
native conclusions is to be made on the 
basis of the conelusion's necessity. All con- 
clusions could be true, but one is more 
fully determined by the given statement. 

Word Extensions employs items con- 
taining logically necessary implieations of 
words. The E is to select the name of an 
object or attribute that is always implied 
by a given word. The alternatives are all 
implied by the given word, but one is in- 
variably implied whereas the others are 
implied only with some restrietion. 


PROCEDURES 


The Symbolic Study 


The sample utilized in this study consisted of 
the entire senior-class student population of a 
Southern California high school? Although 131 


?The authors are indebted to the administra- 
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boys and 180 girls participated in the testing, the 
sample was later reduced to 86 boys and 139 girls, 
for whom complete test data for all experimental 
factor tests were available. The only criterion for 
exclusion- of Es from the sample was incomplete 
data on these measures. 

Age and IQ information was available for 219 
and 199 students, respectively. Generalizing from 
such demographie data available for most of the 
sample utilized, the estimated mean age was 17.4 
years. The estimated mean IQ, computed from 
combinations of scores obtained from the several 
IQ measures, which were variously administered 
between the eighth and eleventh grades, was 110.4. 
Although IQs ranged from 80 to 151, no students 
were deleted from the sample on the basis of ex- 
treme indexes of general intelligence. 

The total sample of Es was tested in the morn- 
ings and afternoons of 2 consecutive days. Each 
testing session required approximately 2 hours. The 
tests of each factor were so arranged that order 
effects and fatigue effects would be approximately 
equal for all factors expected to be demonstrated. 

The testing conditions under which the battery 
was administered were almost ideal, with one 
major exception. The days of the test administra- 
tion, unfortunately, were only 4 and 5 days after 
the tragic death of President John F. Kennedy. 
Both administrators and school personnel were 
aware that after the day of national mourning, 
the preceding Monday, the students were still 
disturbed and restless. The effects of the national 
tragedy upon the results of this study are unknown. 

Scoring criteria for the marker tests were de- 
veloped Írom the scoring guides employed in pre- 
vious studies. Scoring criteria for the newly de- 
veloped experimental tests were based upon 
preliminary results of pretestings with university 
Students in undergraduate psychology courses. 

Frequency distributions were obtained for all 
part and total scores to determine whether the 
variables would meet the requirements of the 
Pearson T coefficient. A normalizing transforma- 
tion was applied to those variables that were 
moderately skewed or exceedingly platykurtic. 
Extremely skewed or truncated variables were 
dichotomized near their medians. Descriptions of 
the frequency distributions and transformations for 
all the variables are listed in Table 1. 

After it was ascertained that all the part-score 
data and total-score data, raw or scaled, met the 
requirements of the Pearson r or its approxima- 
tions, reliability estimates were obtained from the 
raw scores of the tests. For all tests with two or 
more parts, Spearman-Brown reliability estimates 
were computed. Kuder-Richardson estimates of 
reliability were computed for all one-part tests 
that showed no evidence of speeding. Reliability 
estimates for one-part tests, wherein each item had 
a large possible range of scores, were computed by 
a formula suggested by Gulliksen (1950, p. 378). 


tion, staff, and students of Claremont Hi 
School for their splendid cooperation, 2d 


Reliabilities of one-part speeded tests and of 
school measures could not be estimated. Their 
communalities were expected to approximate the 
necessary estimates of reliability. 

The means and standard deviations of the 
variates are also listed in Table 1. In all but three 
cases, these descriptive statistics are based upon 
raw scores, before any transformations were ap- 
plied. 

Because the score matrix was incomplete and 
the data were differentially scaled, resulting in 
different, kinds of correlation coefficients, the cor- 
relation matrix was obtained from a program that 
computes correlation coefficients between variables 
based upon the total number of individuals for 
whom scores are available. Most of the correla- 
tion coefficients in Table 2, therefore, are based 
upon the whole sample of 225 Es, but some are 
based upon the three variables for which not all 
E’s scores were available. These variables and the 
number of scores available are: Variable 52, 187 
scores; Variable 53, 109 scores; and Variable 54, 
107 scores. The attenuated samples for these vari- 
ables taken independently, of course, result in 
further attenuation of sample size for the coeffi- 
cients among them. The coefficient between Vari- 
ables 52 and 54 was computed from a common 
sample of only 66 Es. This sample size was the 
smallest from which any coefficients were com- 
puted and was considerably smaller than the next 
smallest sample of 87 for the correlation between 
Variables 52 and 53. 

Such variations in sample size upon which cor- 
relation coefficients are based introduce addi- 
tional possibilities of error into the correlation 
matrix due to the necessary generalization that 
each coefficient in the matrix estimates equally 
accurately the actual intercorrelation between the 
variables. For each pair of variables, the coeffi- 
cient based on the available Hs is the best esti- 
mate, and since the Es in the reduced samples 
appeared to have been selected on irrelevant vari- 
ables, that is, their attendance at the school when 
the tests were administered, any bias was thought 
to be negligible. 

An additional consideration of the program 
employed is that it computes product-moment 
correlation coefficients upon any input data. This 
means that some coefficients are point-biserial T's 
and some are phi coefficients. Standard corrections 
(Guilford, 1965, pp. 324, 353) were applied to each 
kind of coefficient to improve it as an estimate of 
the Pearson r. Thus, the coefficients reported in 
Table 2 are all Pearson r coefficients or estimates 
of the Pearson r. 

The correlation matrix was submitted to an 
iterative communality-estimation program for es- 
timates of communalities to be inserted into the 
principal diagonal for factor analysis. The iterated, 
stabilized communality estimates were put into 
the diagonal cells of the correlation matrix, and 
the matrix was submitted to a program that ex- 
tracts principal-axes factors until the eigenvalues 
become negative, at which point extractions are 
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TABLE 1 
Means, STANDARD DEVIATIONS, AND RELIABILITIES OF SYMBOLIC SCORES 
Test Mean Standard deviation Reliability^ 

1. Abbreviations-ESIOIB 11.67 4.20 .204 
2. Best Letter Set-ESS03A 13.47 5.41 .56 
3. Best Number Class-ESCO1A 22.678 6.74 87 
4. Best Number Pairs-ESC02A 17.32 5.99 .73 
5. Camouflaged Words-NSTOIA 8.33 2.89 748 

6. Circle Reasoning-CSS01D 6.84 2.72 674 
7. Correct Letter Orders-ESS04A 16.04 8.17 .58 
8. Correct Number Series-ESS05A. 19.66 9.67 74 

9. Correct Spelling-ESU04A 36.98 11.49 E 
10. Decoding-EST01A 16.20 6.39 74 
11. Derivations-ESU08A 99.96 20.08 .81 
12. Disemvowelled Words-CSU04B 11.53 4.21 «194 
13. Familiar Letter Combinations-ESU05A 14.23 6.59 .49d 
14. Form Reasoning-N8102C 17.49* 5.26 .964 
15. Identical Forms-EFU02A 38.12 7.30 63° 
16. Jumbled Words-EST03A 36.83* 10.86 75 
17. Letter Problems-ESI02A 14.95 8.88 .88 
18. Letter Triangle-CSS02B 5.68 2.85 554 
19. Letter **U"-ESUO6A 53.63 11.28 .84 
20. Marking Speed Test 94.24^ 18.12 „44e 
21. Number Classification-CSC03B 11.41^ 3.35 724 
22. Number Grouping-DSC01B 14.10 4.42 .79 
23. Number-Group Naming-CSC05A 10.17* 2.12 .T64 
24. Numerical Operations-MSI01B 22.23 8.62 78 
25. Operations Sequence-NS801B 12.27 5.29 .80 
26. Perceptual Speed-EFUO01A 48.01 9.21 .05* 
27. Related Words I-ESR03A 14.00 5.41 dut 
28. S Test-ESI04A 8.55 3.42 dur 
29. Seeing Trends II-CSROIB 8.16 3.25 n 
30. Series Relations-ESS06A 9.93 7.22 ihe 
31. Sign Changes-NSIOIA 17.49 5.05 j^ 
32. Sign Changes II-ESRO1C 17.31" 3.64 iz 
33. Similar Pairs-ESRO04A 20.14! 7.40 on 
34. Sound Grouping-ESC04A 11.92 b. rat 
35. Symbol Grouping-CSI01B 11.22 36H 0 
36. Symbol Identities-ESU07A 72.12 pe ie 
37. Symbol Manipulation-ESR02C Bra sarod ine 
38. Symbol Reasoning-ESI03A i s 98 a 
39. Typing Errors-EST02A 9. A 50 ‘ 67 
40. Varied Symbols-DSC03B 10.92 pd 14 
41. Way-Out Numbers-ESS07A 28.48 p neo 
42. Word Changes-NSS02C d SEL p 
43. Word Choice-ESC03A HET 4 154 
44. Word Combinations-CSU06A ie 9.14 "5 
45. Word Patterns-CS103C 10.76 464 E 
46. Word Relations-CSR02B 96:60 7.91 “gat 
47. Word Transformation-NST02B 20:35 5. 45 n 
48. ITED General Vocabulary-CMU 4T 88 1162 Em 
49. PSAT Verbal-CMU and EE ap 


50. SCAT Verbal-CMU 


a Total scores dichotomized at the medians for intercorrelations. 

» Total scores C-scaled for intercorrelations. 

* All estimates of reliability are Spearman-B; 
noted. Lh 

3 Kuder-Richardson eae e Prams xi 

* Communality entered as reliability es "i i 

f Reliability DEIN through formula 21.21, in Gulliksen (1950). 
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stopped. This procedure resulted in the extraction 
of 32 real principal-axes factors. The first 18 fac- 
tors accounted for 93.7% of the total variance of 
the 32-factor matrix and are presented in Table 3 
as the unrotated factor matrix. 

The principal axes were submitted to a program 
designed to rotate the loadings as closely as pos- 
sible to a fixed target matrix of loadings (Cliff, 
1964). The construction of the target matrix de- 
pended upon the intuitively inferred structure of 
the empirical matrix, simple structure, positive 
manifold, and the factor hypotheses. Successive 
adjustments of the target matrix effected con- 
siderable improvement in the empirical rotated 
matrix on all four criteria. The result of the final 
target-oriented rotation upon the principal-axes 
matrix is presented in Table 4 as the rotated factor 
matrix. 


The Semantic Study 


A high school in a newly developed industrial 
community in Los Angeles County cooperated in 
this study. The final testing battery was adminis- 
tered in 2-hour sessions on each of 4 days, inter- 
persed during a 2-week period. Approximately 400 
eleventh-grade students participated in at least 
one of the testing sessions. The tests were or- 
ganized in eight booklets, each of which required 
about 55 minutes. Rest periods were introduced 
between administrations of booklets. 

An inspection of answer sheets indicated that a 
language IQ of at least 95 is necessary for ade- 
quate understanding of some of the test instruc- 
tions. For this reason, the final sample was com- 
posed of students who completed all of the test 
booklets and who had both California Test of 
Mental Maturity language and nonlanguage IQs 
of 95 or above. 

Reliabilities were estimated by correlating part 
scores and applying the Spearman-Brown formula. 
In an effort to increase reliabilities, several tests 
were item analyzed when their reliabilities were 
found to be lower than .50. After the item analyses, 
the internal-consistency reliabilities were esti- 
mated for these tests. Test reliabilities are re- 
ported in Table 5 along with means and standard 
deviations. 

In this study, two separately timed parts of 
each of six reference tests were employed to de- 
fine the pertinent reference factors. The reliabilities 
of these reference tests are indicated by correla- 
tions between the two separately timed parts, not 
by applying the Spearman-Brown formula. 

The use of alternate forms of certain factor 
tests should increase the chances of appropriate 
location of the axes for the factors that those tests 
represent, but there is the disadvantage that load- 
ings in those tests may involve specific as well as 
common-factor variance. Loadings for the factors 


„`The authors wish to thank the administra- 
tion, staff, and students of the La Puente School 
District for their excellent cooperation. 


in other tests should presumably not be affected. 
Since it is this reference-factor involvement in the 
experimental test that is to be accounted for, the 
use of alternate forms of marker tests can thus 
be justified. 

Frequency distributions were inspected for ir- 
regularity. All test scores were accepted as appro- 
priate for Pearson product-moment correlation 
coefficients. The correlation matrix is given in 
Table 6. 

In extracting principal-axes factors, a com- 
puter program for iterative factor analysis was 
applied. The iterative factor-analysis program ex- 
tracts a specified number of principal-axes factors 
using estimated communality values. In the first 
iteration, the program reextracts principal-axes 
factors using the computed communalities ob- 
tained from the result of the first extraction. 
Starting with the highest correlation in each col- 
umn as the initial estimate of communalities, the 
iterative factor-analysis program iterated the 
principal-axis extraction process nine times until 
the communalities became stabilized. The com- 
munalities changed less than .01 between the eighth 
and ninth cycles of iteration. 

The extraction of principal-axes factors is equiv- 
alent to choosing a set of factors in decreasing 
order of their contribution to the total variance 
of the matrix. This principle provides a rough 
numerical guide as to the number of factors to be 
retained for rotational solution. Since the ob- 
served correlations are subject to error of estimate, 
it was decided to retain the largest number of ex- 
tracted factors whose cumulative contribution ac- 
counted for less than 95% of the total variance. 
According to this criterion, 14 principal axes were 
retained and iterated and are presented in Table 7. 

Graphic orthogonal rotations were used to lo- 
cate the new reference axes. During the initial 
phase of the rotation processes, the objective cri- 
terion of simple structure was the primary con- 
sideration. The secondary aim was to spread the 
variance of factor loadings as equally as possible 
among the factors. After each axis had been ro- 
tated at least once, and the reference factors de- 
fined by the marker tests began to emerge, posi- 
tive manifold and psychological meaningfulness 
became the important criteria, The final rotations 
consisted of computer adjustments, employing 
Clifi’s (1964) procedure, aimed at the improve- 
ment of positive manifold and simple structure. 
The rotated factor matrix is presented in Table 8. 


RESULTS OF THE Factor ANALYSES 


The interpretations of rotated factors 
rest principally upon tests with factor load- 
ings of .30 or greater. The names of the 
tests and the factor for which each test was 
designed are listed preceding the discussion 
for each obtained factor. A test is listed if 
it has a loading as large as .30. If a test 
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TABLE 3 
UwnorATED Symporic Factor MATRIX 
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TABLE 5 
Means, STANDARD DEVIATIONS, AND RELIABILITIES OF SEMANTIC Scores 
Test Mean Standard deviation Reliability* 
51. Reading Vocabulary (CAT) 103.8 24.4 .885 
52. Arithmetie Reasoning (CAT) 101.1 27.4 .895 
53. Apparatus Test-EMI05C 9.7 4.3 hel, 
54. Best Trend Name-EMRO1A 11.4 3.6 73 
55. Best Word Class-EMCO01A 20.7 3.4 .61e 
56. Best Word Pairs-EMC02A 18.8 4.7 73 
57. Class Name Selection-EMC03A 21.4 3.7 .63 
58. Commonsense Judgment I-EMI04A 5.6 1.6 .524 
59. Commonsense Judgment II-EMIO4B 4.2 1.5 264 
60. Complete Thoughts-EMS01A 39.0 6.5 Ba 
61. Double Descriptions-EMU01A 35.2 4.2 61 
62. Important Faets-EMS02A 11.8 3.0 49 
63. Logical Reasoning, Form A-2 12.4 4.5 .82 
64. Matched Verbal Relations-EMR02A 14.7 4.8 .65 
65. Pertinent Questions, Form A-J (Part I) 9.2 2.6 .65* 
66. Pertinent Questions, Form A-J (Part IT) 11.0 2.6 .65* 
67. Picture Gestalt-NMTO3B 20.0 2.4 64 
68. Possible Jobs-DMI03A (Part I) 8.0 3.0 67° 
69. Possible Jobs-DMI03A (Part IT) 7.7 3.5 67° 
70. Product Choice-EMTO1A 34.6 5.6 57! 
71. Seeing Problems-EMIOIB 14.7 5.2 A3 
72. Sentence Selection—EMI02A 9.6 2.7 .46 
13. Sentensense-EMS03A 15.0 2.3 .50* 
74. Ship Destination Test, Form A-2 22.5 9.9 674 
75. Similarities-CMT02B (Part I) 13.5 3.2 .54e 
76. Similarities-CMT02B (Part II) 10.2 3.3 .54* 
77. Story Titles-EMTO2A 217 7.0 .67 
78. Synonyms-EMU02A 22.2 3.9 .39 
79. Unlikely Things-EMS04A 23.6 2.8 54 
80. Useful Changes-EMT03A 16.7 3.7 .58 
81. Verbal Analogies I-CMROIA (Part I) 5.2 1.9 38° 
82. Verbal Analogies I-CMRO01A (Part II) 9.2 2.6 -38° 
83. Verbal Analogies ITI-EMR04A 8.1 3.1 .60 
84. Verbal Classification-CMC02A (Part I) 25.5 8.1 .64* 
85. Verbal Classification-CMC02A (Part II) 30.4 10.1 .64* 
80. Verbal Comprehension, Form B 15.4 3.7 .68 
87. Word Completion-CMUO1A 9.2 3.6 N 
88. Word Extensions-EMI03A 17.5 4.8 .63 
89. Word Linkage-EMR03A 22.0 4.6 44 
90. Word Substitution-EMUO03A 16.1 3.5 .51 
91. Word Systems-EMS05A 11.6 3.3 28! 


a All reliability estimates are based upon alternate-form correlations extended by the Spearman- 


Brown formula, except as noted. 
^ Reported by the test publisher. 
* Kuder-Richardson formula 20. 


d Communality as the lower bound estimate of reliability. 
* Correlation between the two parts of the same reference test. 


f Hoyt’s method for estimating reliability. 


has loadings of .30 or greater on other fac- 
tors in the solution, those loadings and 
their factors are mentioned in parentheses. 

Emphasis is placed upon the discussion 
of newly isolated factors of evaluation. The 
factors of cognition and other reference fac- 
tors will be discussed to the extent that this 


helps to understand the nature of evaluative 
factors, or throws new light on the reference 
factors and their tests. Extensive discussion 
of all reference factors in this study has 
been given elsewhere in the series of Re- 
ports from the Psychological Laboratory 
at the University of Southern California. 
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TABLE 7 
Unroratep SEMANTIC Factor MATRIX 
Variable AY) Bei Mes | Daa Soon NG GME Monty E TE Gaai: |N i BR 
51. 64 05|—35| 12| 36| 20| 15 |—16 |-01 |-12 |—06 |-21 |-09 | 04 | 84 
52. 66 12|-09| 08| 41| 20| 07 |—06 |—12 |-01 |-07 |-06 | 01] 00| 70 
53. 36 |-35 | 08| 08|—00| 09| 05|—03|—15| 23|—02|—00| 08|—00 | 36 
54. 69 13|-01|-07| 07 | 14 |—20 | 01|—04| 03|—03 | 11| 09|—03| 59 
55. 58 12|-06| 29| 13|—20| 15| 08|—04| 25|—02| 12| 02,—00, 60 
50. 58 13| 11| 04| 11|-08| 02| 06 |—07 |—03| 03| 01| 04|—06 | 40 
57. 66 10 |-07 | 04 |—02 |—20 | 14| 04|—04| 07| 00| 13 |-03| 23 59 
58. 56 09 | 02| 03|—10|—22| 05| 03|—04| 11|—06 |-14 |-15 | 09 | 45 
59. 20 |-17|-10| 25| 01| 14| 18| 18|—05 | 09 |—01| 08 —01|—14| 26 
60. 65 10 |—07 |—07 |-13 |-20 | 02| 00|—-07 | 15|—17 |-08 | 11| 08| 58 
61. 43 |-05| 29| 25|—33 | 13| 02|—19 |-20 |-18 |-05 | 03 |—14 | 04 59 
62. 54 os| 06 |—03 | 01|—21| 07|—07|—05| 07|—09 |-02 —12 | 10| 40 
63. 67 15| 11| 02|—04 |-01 |-13 |j-11 |j-00 | 02 |—05| 06| 14 12 | 56 
64. s0 | 24| 09|—04|—05 | 04 |—09 |-06 | 06 |—06 |—04 09 |—00 |—06 | 74 
65. 29 |-721]|-12| 11| 09 |—08 |-09 | 20|—11 |-00 —00| 00} 10, 02, 70 
66. 47 |—58 |-04 |-02 | 06 |-18 |-16 | 17 |-11 —03 |-04 |-04 |-03 |-17 | 70 
67. 40 |—31 | 11| 02| 04| 16| 12 |—17 |—22 —09| 13| 03| %6 | 07| 42 
68. 50 |—34| 14|—56| 12| 09| 26|—10| 04 17 | 03 |—01 |—00 | 04 | 83 
69. 59 |-22| 04|—41| 05| 05) 10—05 |—o1| 03, 03) 06 —06 |—04 | 59 
70. 40 |-07| 20| 08|—23 | 09| 22|—10, 01 01 | 04| 02 |—08 |-03 | 34 
T13 33 |-40| 03| 04|—-08| 03| 12|-04 13|-26 |-15 | 26| 01| 02| 47 
72. 62 18| 04|—17|-08 | 07 |—02 | 03 |—00 04|—22 |-10 | 18| 04| 55 
73. 54 |—07 | 01|—06 |-21 | 06 |-01 | 11 04 |-07 |-34 |-05 |-17 |-00 | 52 
74. 59 16| 23| 09| 01| 28|-22| 04 |—26 02|—04 |—03 | 07 | 02| 64 
75. 41 |—32 | 11| 11 |—04 |-05 |—26 —95| 12| 09|—06 |-13 | 02 —15 | 49 
76. 49 |-39 | 20| 21| 11|—09 |-26 —91| 34| 12| 12|-03 |-05| 16| 78 
77. 62 o2 |—15 |—01 | 01| 05 |—07 | 20 16 |—05 |—06 | 06 |—12 |—01 | 50 
78. 55 |-01 |-16 | 01 |—23 |-12 05| 01| 08|—07 |-03 |-00 | 17 —02| 44 
79. 52 | 00) 17|-08|-25 | 07| 05| 24 03 | 03 | 25 |—28 |-07 |-O1 | 58 
80. 36 | 09 |—34 | 09 |—22 |-09 |-04 |—16 —17| 16| 24| 06 |—02 |—09 | 47 
81. 62 | 20|-02| 03| 17| 03 —14 |-03 | 07| 12|—05| 05 |—17 —10 | 54 
82. 64 17| 14| 05—00 | 08 10| 04| 21|-03| 09| 10| 16 —04 | 57 
83. 64 | 25| 03|—14| 04| 03 |—14 —07| 00| 10| 07|- 15 |—11 |-17 | 60 
84. 63 | 02| 20| 03| 15|—25 | 08 11 |-15 |-14 | 11 | 04 |—09 |-03 | 61 
85. 65 | 13| 20| 02| 26|-32 02| 01| 04|—-32| 08|—09| 06 —02 77 
86. —o6 |—43 | 03 |—17 | 08 |—08 |-07 08 |—06 | 02 |—03 | 05| Ol | 67 
s7. 10 zo is —04|—09| 09| O01} 08| 08 —05| 13| 08|-11| 10| 70 
88. 66 | 17| 01 |—18 |-08 |-12 —09 | 09|—04|—09 | 07 | 02, 08 —=12 | 57 
89. 47 | oœ | 09| 26|—00| 05 39 |-03 | 22| 07 |—01 -11 | 11 —18 | 57 
90. 65 |-01|-30 |-06 |-11 | 03 —06|-06|—04|-17| 12 |—11 | 06 07 | 60 
91. 40 | 02| 19| 11| 08 30|-11| 37| 12| 07 0g |—00 |—01 | 16 | 50 


Note.—Decimal points omitted. 


Interpretation of the Symbolic Reference 


Factors 


CSU—Cognition of symboli 
12. Disemvowelled Words 


9. Correct Spelling (ESU) 


c units: 
(CSU) .55 


47 


47. Word Transformation (NST) 35 


(43 NST) 
44, Word Combinations (CSU) 32 
34. Sound Grouping (ESC) 90 


(41 ESC; .38 CMU) 
Disemvowelled Words once again leads 
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TABLE 8 
ROTATED Semantic Factor MATRIX 


Test name CMU}CMC|CMR|CMS |CMT|CMI |DMI|NMT|EMU|EMC|EMR EMS |EMT |EMI | h? 
51. Reading Vocabulary CMU | 70 | 19| 11| 47| 08/|—03| 15|—03, 03| 17) 10|—04| 06|—02| 84 
52. Arithmetic Reasoning CMS | 40 | 24| 14| 57| 02) 01| 15| 07|—02| 19) 26| 02) 03| 09) 70 
53. Apparatus Test EMI 01 |—06} 10| 26| 18| 34| 19| 14| 09) 13| O1| 06] 20| 10|36 
54. Best Trend Name EMR 29 | 12) 07| 28| 05| O7, O7| 19| O4| OS| 48| 14| 10| 29) 58 
55. Best Word Class EMC 16| 19| 23| 22) 05| 12,—006|—03|—02, 54| 20| 04| 22| 18| 60 
56. Best Word Pairs EMC 16 | 34| 15| 21) 02, 05| 06| 08) O6| 17| 25| 12) 13) 23) 40 
57. Class Name Selection EMC | 31 | 21| 07| 05| 00| 07, 08| 16| 10| 50| 17| 11| 12) 29| 58 
58. Commonsense Judgment I | 21 | 21|—01| 07| 11| 01| 09,—09| 21| 33| 11| 14| 18| 33) 45 
EMI 
59. Commonsense Judgment II | 11 |—05| 22, 18,—08| 28|—04|—05| 07| 14| 01| 10| 15|—13| 26 
EMI 
60. Complete Thoughts EMS 27| 11) 07| 10 04| 07| 10| 03| 12| 28| 17) 03| 20| 55| 58 
61. Double Descriptions EMU | 05 | 14| 05| 16) 13| 06|—07| 27| 64| 05| 09| 08) 15| 07| 60 
62. Important Facts EMS 17 | 24 oy 09) 10) Ol| 16) OL) 18) 33| 17) Ol| 09) 28) 39 
63. Logical Reasoning EMR 23 | 16} 12) 18) 17|—02, 02) 27| 13| 18) 32, 09} OS| 40 55 
64. Matched Verbal Relations) 32 | 26| 18| 16| 08,—05| 08| 16| 23| 14| 50| 13| 12| 34| 74 
EMR 
65. Pertinent Questions (Part I) 17 | 05|—06| 10| 20| 75| 08| 08|—07| 04|—12| 09| 09| O1| 70 
CMI 
66. Pertinent Questions (Part II) 18 | 22|—09| 07| 20| 68| 17|—05| 04|—01| 11| O7| 17| 12| 69 
CMI 
E^ Picture Gestalt NMT 14| 13| 06) 25) 12) 23| 24| 37| 18| 03|—03| 03| 15|—04| 42 
3. Possible Jobs (Part I) DMI | 12 | 05| 11| 07| os| 20| 82| 16| Ol| O7| 12| os| 02| 19| 84 
69. Possible Jobs (Part IT) DMT | 24 | 14| 04| 06) 03| 21| 56| 15| 09| 06| 27| 08| 09 18| 60 
70. Product Choice EMT 07 | 07| 24) 04) 09| O4| 16) 14| 39) 13| 06} 12| 17| 06) 34 
71. Seeing Problems EMI -24 | 08) 18|—11| 08) 43) 11| 22| 27| 05) 09|—07|—15|—03| 47 
/2. Sentence Selection EMI 27 | 07| 16|-20,—02,—02, 15| 09! 13] 05) 25) 10| 03) 54| 56 
‘Sentensense EMS 29 | 04 05| 08|—01| 22) 11|—12| 43| 05) 25) 11|—02, 29 52 
hip Destination Test CMS | 08 | 13| 02| 50| 05| Ol|—04| 24) 22| 00| 33| 25| 11) 28) 65 
‘Similarities (Part I) CMT |11| 05| 07| 10| 53| 25| 07| 00| 16|—08| 18|—04| 17| 16) 50 
milarities (Part II) CMT | 15 | 13| 09| 03| 75| 24| 09| 14| 04) 18| 15| 17|—01| 05| 18 
ory Titles EMT 43 | 12) 09) 07| 02| 17| 05|—05| 09| 17| 36| 25| O1| 17| 50 
ynonyms EMU 39 | 10| 20|—09, 03| 14| 00| 11| 12| 11| 12) 05| 20| 35) 45 
Unlikely Things EMS 15 | 18) 12} 04 O7, Ol| 19) 01) 20|—Ol1| 03| 55| 27| 25| 59 
eful Changes EMT 28 |—04/—06/—02) 02/—04/—05| 11| 03| 19| 16|—03| 55| 07| 47 
81. Es Analogies I (Part I) | 26 | 18| 06| 26) 14|—04| 08|—08| O7| 22| 50| 10 10| 14| 55 
82. ho 4 I (Part II) 24| 22| 43| 09) 07,|—03| 07| 20| O7| 15| 30| 22| 06| 23| 57 
83. Verbal Analogies IIT EMR | 19 | 19} 06| 15| 04|—09| 17| O7| os| 14| 59| 10| 22 19| 59 
84. Mee O (Part I)| 12| 57| 05| 14| 01] 16| 12| 09| 12| 26| 21| 14| 13| 17| 69 
85. Verbal Classification (Part | 25 | 71| 15| 10| 13| 03 —01| 28 76 
TD CMG 06) 09| 03| 13| 18| 07|—01 
86. Verbal Comprehension CMU | 67 |-04| 06| 04| 10| 14| O1| 12| 09| os| 21| 08| 26| 21| 68 
87. Word Completion CMU 63 | 04 02 Ol) 02, 18| 14| 14| 09] 23| 26| 25| 19| 09, 70 
88. Word Extensions EMI 27 | 32) 07| 02,—06| 04| 11| 11| O0| O4| 37| 16| 22| 39| 58 
89. Word Linkage EMR 20| 13| 60| 13| 10|—01| 07|—06| 19| 18| 00| 09| 18| 11| 57 
90. Word Substitution EMU 60| 15.—02; 07| 05| O7| os| 20| OS| 03| 13| 13| 25| 24| 60 
91. Word Systems EMS 09| Ol) 15| 24| 07} 10.—03| 09| 04| 10| 19| 57|—11| 10| 51 


Note.—Decimal points omitted. 
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the tests that are loaded on the CSU faetor. 
It would appear from the test’s history that 
the factor is concerned with the recognition 
of complete and correct words; a factor 
that might be called “word closure” (Pem- 
berton, 1952). Such a conclusion is strength- 
ened by the fact that Correct Spelling, 
designed as a measure of ESU, is also a 
measure of the recognition (evidently not 
in an evaluative way) of complete and cor- 
rectly spelled words. 

Tests for both CSU and NST were cor- 
related in the previous analysis (Guilford, 
Merrifield, Christensen, & Frick, 1961) in 
which those factors were discovered. From 
Word Transformation’s correlation with 
the CSU factor, although secondary, it ap- 
pears that this symbolic redefinition task is 
dependent upon recognition of the symbolic 
units needed in effecting the transforma- 
tion. The recognition of symbolic units 
necessary for sounding out words may ac- 
count for the CSU loading for Sound Group- 
ing. 

CSC—Cognition of symbolic classes: 

4. Best Number Pairs (ESC) 54 

23. Number-Group Naming (CSC) .47 

(.38 NSS; .35 ESC) 

21. Number Classification (CSC). 43 

32. Sign Changes II (ESR) 35 
(43 E 


3. Best Number Class (ESC) 32 
(.50 ESC; .32 ESS) 
22. Number Grouping (DSC) 31 
(.34 EST) 
The two shortened forms of the tests se- 
lected to measure the CSC factor function 
as anticipated, but the tests for CSC are led 
by a test designed for ESC. Like its analo- 
gous semantic test, Best Word Pairs, Best 
Number Pairs contributes more to the cog- 
nition-of-classes factor than to its parallel 
evaluation factor, Best Number Pairs asks 
E to evaluate which pair of stimuli makes 
the best class, but the specific properties of 
the best class, in other words, the specific 
criteria, are not defined in each item. Since 
the specific nature of the best class needs to 
be discovered for each item, cognition abili- 
ties should be expected to determine much 
of the test’s variance. ^ 
The CSC factor defined in this analysis 
could be confined to the ability to recognize 


common properties of numbers, but a letter 
test has previously been loaded on it (Guil- 
ford, Merrifield, Christensen, & Frick, 
1961). The significant loadings showing 
that three tests share relations to both CSC 
and ESC impressively demonstrate that it 
has not been easy to measure one of these 
factors without also measuring the other. 
CSR—Cognition of symbolic relations: 


29. Seeing Trends II (CSR) A6 
(.85 CMU) 

16. Jumbled Words (EST) 35 
(48 EST) 

46. Word Relations (CSR) 31 
(.30 ESI) 

43. Word Choice (ESC) 31 


(.32 CSI; 381 ESC) 
Both CSR tests function as anticipated 
in this analysis. The significant CMU side 
loading for Seeing Trends II is not reason- 
able as the trends are based solely on the 
letter content of the words and not on their 
meanings. The result is consistent with a 
similar CMU side loading found by Guil- 
ford, Merrifield, Christensen, and. Frick 
(1961), however. Some semantic recogni- 
tion must somehow be involved in the test. 
Depending upon anagrammatic rear- 
rangements of letters of words, and the fact 
that some letters characteristically do or do 
not appear together, responding to Jumbled 
Words may involve CSR ability before 
evaluation enters the process, The common 
properties of many of the items of Word 
Choice were literal relationships—again in- 
dicating a need for CSR ability for re- 
sponding. 
CSS—Cognition of symbolic systems: 


18. Letter Triangle (CSS) 39 
33. Similar Pairs (ESR) 86 
(.35 ESR) 

6. Circle Reasoning (CSS) 80 


Letter Triangle leads the factor called 
CSS, the ability to comprehend systematic 
arrangements of symbols. The unusually 
small loading of Circle Reasoning does not 
strengthen this factor much. The implica- 
tion derived from only slightly successful 
attempts to isolate CSS is that additional 
measures should be developed and analyzed 
so that stronger measures that consistently 
cohere are available. 

Similar Pairs’ loading on the OSS factor 
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should be accounted for by cognition that 
might be necessary to recognize the rela- 
tions within word pairs. The determination 
of some of these relations is dependent upon 
alphabetical order, which also determines 
the systems in Letter Triangle. 
CSI—Cognition of symbolic implica- 
tions: 
45. Word Patterns (CSI) 42 
28. S Test (ESI) Al 
35. Symbol Grouping (CSI) 33 
43. Word Choice (ESC) 32 
(31 CSR; .31 ESC) 
The ability to foresee symbolic implica- 
tions, CSI, is defined in this analysis by the 
two tests designed to measure the factor and 
also by the S Test, a test originally designed 
to measure sensitivity to problems. The 
original F test, from which the S Test was 
derived, probably did not indicate the tra- 
ditional “‘sensitivity-to-problems” factor 
because that factor has been semantic. The 
semantic tests that did bring out the sensi- 
tivity-to-problems factor were later shown 
to belong largely on factor CMI, the cogni- 
tion ability parallel to CSI, on which the S 
Test is now loaded. It appears that there is 
no difference between the ability to see sym- 
bolic implications and being sensitive to 
them. 
CMU—Cognition of semantic units: 


50. SCAT Verbal (CMU) .79 

53. PSAT Verbal (CMU) 4?) 
52. ITED General 

Vocabulary (CMU) 57 

(.31 ESS) 

34. Sound Grouping (ESC) 38 


(.41 ESC; .30 CSU) 


29. Seeing Trends IT (CSR) 85 
(.46 CSR) 
7. Correct Letter Orders (ESS) 32 
(43 ESS) 
The  verbal-comprehension factor is 


clearly defined by the three tests selected to 
measure CMU. The tests are relatively 
univocal, having very high loadings on 
CMU. 

Once again, Sound Grouping demon- 
strates its factorial complexity by splitting 
its variance between ESC, CSU, and CMU. 
Familiarity with the words used as stimuli 


for this test, in both their symbolic and 
Semantic aspects, appears to facilitate 
either the pronunciation or the classifica- 
tion based upon the pronunciations, One 
might expect an analogous phenomenon 
underlying the CMU loadings of Seeing 
Trends II. But knowledge of meanings of 
the words in that test in no way aids in the 
discovery of the symbolic trend. The only 
common element in the CMU tests and 
Seeing Trends II is that words are em- 
ployed. 

MSI—Memory for symbolic impliea- 
tions: 


24, Numerical Operations (MSI) 61 
(.33 NSI) 

31. Sign Changes (NSI) 38 
(.30 NSI; .32 ESU) 


Numerical Operations emerges clearly as 
test defining the numerical-facility factor, 
MSI receives further support from Sign 
Changes, a test designed to measure NSI, 
but which has consistently shared MSI 
variance, The shared variance on these two 
factors is not great enough, however, to 
cause serious concern regarding their dis- 
tinctness. 

DSC—Divergent production of symbolic 
classes: 


40. Varied Symbols (DSC) 46 


Varied Symbols once again serves as the 
principal measure of the DSC factor, Num- 
ber Grouping, which was not loaded on 
DSC in a previous investigation (Hoepfner 
& Guilford, 1965), failed to be again. The 
strong face validity of Number Grouping 
as a measure of DSC apparently is mis- 
leading, as the test is complex in this analy- 
sis. 

NSS—Convergent production of sym- 
bolic systems: 


56 

(.32 ESS) 
28. Number-Group Naming (DSC) .38 
(.47 CSC; .35 ESC) 

30. Series Relations (ESS) 31 
(.48 ESS) 

25. Operations Sequence (NSS) .30 


42. Word Changes (NSS) 
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The two tests selected to measure NSS 
perform again in a reliable manner. The 
side loading of Word Changes on ESS 
might be due to a strategy that employs 
alternate orderings that are quickly eval- 
uated according to the limitations imposed 
by criteria given in the test instructions. 
A similar kind of rationale, in reverse, 
could explain the minor secondary NSS 
loading of Series Relations. The E tries out 
each given operation in turn, producing a 
fully determined series. 

NST—Convergent production of sym- 
bolic transformations: 


5. Camouflaged Words (NST) .62 
47. Word Transformation (NST) 43 
(.35 CSU) 

3. Best Number Class (ESC) 31 
(.32 CSC; .50 ESC; .32 ESS) 


The symbolic redefinition factor emerged 
in this analysis with both tests selected to 
measure it loading primarily upon it. The 
CSU side loading of Word Transformation 
was discussed in connection with the CSU 
factor. 

NSI—Convergent production of sym- 
bolic implications: 


14. Form Reasoning (NSI) .59 
31. Sign Changes (NSI) .30 
(.33 MSI; .32 ESU 


The two tests selected to measure NSI 
perform as expected. To date, this solution 
represents the clearest separation between 
the MSI and NSI factors, although there 
Still seems to be some common aspect in 
the tests used here to measure them and 
more univocal tests are apparently needed 
for both. 

EFU—Evaluation of figural units: 


26. Perceptual Speed (EFU) 69 
15. Identical Forms (EFU) 63 
ll. Derivations (ESU) 31 

(34 ESU) 


The two EFU tests perform just as 
hypothesized, being univoeal and highly 
Saturated with common-factor variance. 
The EFU factor is the clearest interpreta- 
ble factor to emerge in this analysis, proba- 


bly due to its marked dissimilarity of con- 
tent from the symbolic and ‘semantic 
factors, 


Interpretation of the Symbolic Evalua- 
tion Factors 


ESU—Evaluation of symbolic units: 


36. Symbol Identities (ESU) .62 
19. Letter ^U" (ESU) .56 
ll. Derivations (ESU) 94 
(81 EFU) 

20. Marking Speed Test 38 
31. Sign Changes (NSI) .32 
(.33 MSI; .30 NSI) 


The ESU factor emerged with remarka- 
ble clarity. The two speeded sensitivity 
tests that were previously suggested as 
measures of ESU (Guilford & Hoepfner, 
1963) lead the factor with high loadings 
and little or no complexity. It is interest- 
ing to note that with strong tests of EFU 
and a sufficient number of tests for HSU, 
including Symbol Identities and Letter 
“U,” in the analysis, the two factors sepa- 
rate very clearly. This decisive result 
clears up earlier uncertainties as to whether 
"perceptual-speed" tests composed of lit- 
eral material should go with Thurstone’s 
original perceptual factor or should rep- 
resent a separate factor (Bechtoldt, 1947; 
Coombs, 1941; Thurstone, 1938b). 

The third ESU test loaded on the ESU 
factor is Derivations, also a sensitivity 
test, involving words, Based upon the three 
ESU tests loading on this factor, it appears 
that ESU is the ability to make rapid de- 
cisions regarding the symbolic identity or 
accuracy of words, letter sets, and number 
sets. In Symbol Identities, there is à com- 
parison of two given symbolie units to 
determine whether or not they are identi- 
eal. In Letter ^U," a word class is specified 
(words containing the letter “U”), with E 
to say whether or not each word satisfies 
the specification. Symbol Identities is E 
direet parallel to figural tests of EFU, in 
which figures are to be compared, with E 
to decide whether or not they contain ex- 
actly the same elements. 

Derivations does not fit either of the two 
item models just described for Symbol 
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Identities and Letter ^U." 'The things being 
compared are not exactly the same except 
for one element, as in the former, nor are 
class specifications given, as in the latter. 
'The letters of the short word said to be 
extracted from the long word must coin- 
cide with a completely identical set of 
letters in the long word, except that the 
order is probably different. There is no 
clear model presented for comparison. This 
may be a reason for the lower ESU loading 
for Derivations than for Symbol Identities. 

The two ESU “misses,” those tests hy- 

pothesized for ESU but not loaded signifi- 
cantly on it (Correct Spelling and Familiar 
Letter Combinations), also aid in inter- 
preting the ESU factor by indicating what 
ESU is not. One characteristic the two 
“misses” have in common but do not share 
with the other three ESU tests, is that the 
things with which comparisons must be 
made are not given on the printed page. 
They can only be compared with some- 
‘thing in memory storage or something re- 
eved from memory storage, perhaps in 
the form of an image. In Correct Spelling, 
the needed model is the remembered correct 
ban ng of each word. The task boils down 
) the question of how many of the 120 
words in the test does E know, spelled cor- 
rectly. This statement of the task makes it 
appear like a measure of cognition, as it 
turned out by analysis to be. 
. In Familiar Letter Combinations, the 
criterion for judgment is familiarity of the 
syllables or their observed probability of 
occurrence in E’s experience. In this test, 
there is no clear model for E to use, and 
what he has to use is also something in 
or from memory storage. Although it ap- 
pears that the memory feature applies es- 
pecially to the two ESU tests that missed, 
the question arises as to how general the 
implied evaluation principle is. If it is quite 
general, the principle that comparisons 
must be between perceived information 
would place an important restriction upon 
the definition of evaluation abilities, 

The presence of Marking Speed, along 
with Sign Changes, here suggests some con- 
founding of a finger-speed factor with both 
NSI and ESU. Rotation to an additional 


finger-speed factor might have cleared up 
the picture for both NSI and ESU. Im- 
plied is a general principle, namely, that 
the appearance of obliqueness among fac- 
tors may be due to lack of a sufficient 
number of dimensions being included in 
orthogonal rotations. 

ESC—Evaluation of symbolic classes: 


3. Best Number Class (ESC) .50 
(.32 CSC; .32 ESS; .31 NST) 


32. Sign Changes II (ESR) ES! 
(.85 CSC) 
34. Sound Grouping (ESC) AL 


(.38 CMU; .30 CSU) 

23. Number-Group Naming (CSC) .35 
(.47 CSC; .38 NSS) 

43. Word Choice (ESC) 31 
(.82 CSI; .31 CSR) 

37. Symbol Manipulation (ESR) 91 
(.59 ESR) 


Although three tests designed to be meas- 
ures of the ESC factor are loaded on the 
factor called ESC, the factor is the least 
clear of all the new experimental factors 
found. The ESC tests are all complex. 

The task involved in Best Number Class 
is to realize the numerical classifications of 
given numbers and then to select the one 
classification that is most valuable, value of 
number classes being defined by the test as 
the criterion. 

Sign Changes II requires the examinee 
to change signs in a numerical expression 
so that the expression becomes an equation. 
Introspectively, it seems that a successful 
attack on such problems would include the 
tactie of becoming aware of what both 
sides of the expression have potentially in 
common, and from this awareness, to make 
the appropriate sign changes to bring about 
that common numerical value and thus 
change the expression into an equation. To 
clarify this attack with an example, con- 
sider the sample item: 3 + 1 = 6 x 2. The 
first step in effectively changing this ex- 
pression into an equation is not to sub- 
stitute signs, but to be sensitive to what 
the pair of numerical value 3 and 1, and 
the pair 6 and 2 potentially have in com- 
mon. Their common, or class property is 
either the numerical value of 4 (3 -- 1 and 
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6.— 2), or 3 (3 X 1 and 6 + 2): Since only 
one of the necessary sign changes is given 
as an alternative answer, the only accepta- 
ble solution is the first one, and the com- 
mon element in the expression is the value 
of 4, the value for which the signs must be 
changed. But this line of thinking suggests 
cognition rather than evaluation, factor 
CSC rather than ESC, and the loading for 
this test on CSC is only .35. Another hy- 
pothesis is that E somehow takes the of- 
fered solutions as classes of operation 
changes and considers them for adequacy. 
The two ESC tests with the smaller sig- 
nificant loadings on this factor were con- 
structed as estimation tests of ESC. They 
involve making a choice among given class 
properties on the basis of criteria supplied 
in the test. Although it would seem reasona- 
ble to have rotated the ESC axis to maxi- 
mize its correlations with these more simple 
ESC tests, their tendency toward factor 
complexity and the loss of simple structure 
weighed against such a move. 
ESR—Evaluation of symbolic relations: 


37. Symbol Manipulation (ESR) — .59 
(.31 ESC) 
27. Related Words I (ESR) 43 
33. Similar Pairs (ESR) 35 
(.36 CSS) 


Three of the four tests designed to meas- 
ure ESR, and no others, are loaded on the 
ESR factor, clearly defining it as repre- 
senting the ability to make choices among 
symbolic relationships on the basis of simi- 
larity and consistency. The significant side 
loadings of the ESR tests were mentioned 
before in connection with the respective 
factors upon which the complex ESR, tests 
are loaded. 

ESS—Evaluation of symbolic systems: 


41. Way-Out Numbers (ESS) 57 
30. Series Relations (ESS) 48 
(.31 NSS) 
7. Correct Letter Orders (ESS) 48 
(.82 CMU) 


38. Symbol Reasoning (ESI) Al 
3. Best Number Class (ESC) 82 
(50 ESC; .32 CSC; 31 NST) 


42. Word Changes (NSS) 82 
(.56 NSS) 

8. Correct, Number Series (ESS) -81 
:48. ITED. Verbal (CMU) 31 
(.57 CMU) 


Four of the five tests designed for ESS 
came out significantly loaded on ESS. The 
two leading tests, Way-Out Numbers and 
Series Relations, are in the estimation cate- 
gory and are composed of numbers. The 
two with distinctly smaller loadings, Cor+ 
rect Letter Orders and Correct Number 
Series, are in the sensitivity category, one 
being a number test and the other a letter 
test. It may be of some interest that while 
sensitivity tests proved to be better for 
ESU, estimation tests proved better for 
ESS. The trend must be better supported, 
however, before a principle can be stated, 
It is probably significant, however, that 
a system can deviate from a standard of 
comparison much more readily by degrees 
than can a unit. 

Series Relations and Way-Out Numbers 
differ somewhat in the operations that they 
require. In the former, E probably makes 
new systems (series), following rules 
given in the alternative responses. He then 
compares each new system with the model 
that is given, deciding which new one comes 
nearest. The criterion is the degree of close- 
ness of one set of numbers to another set. 
In the latter test, he is virtually to compare 
the distances of the first and last, numbers 
in an irregular series to decide which one 
is farther from the other numbers. It is as 
if he were treating the same series first as 
one system and then as another, or from 
one point of view then from another, The 
criterion is numerical distance. The dif- 
erence between these two relatively strong 
ESS tests contributes some breadth to the 
nature of the factor. 

In terms of appearance, the two weaker 
ESS tests are close to being alternate forms 
of the same test, and both differ from the 
two stronger tests, as indicated above. 
They present letter series in the one case 
and number series in the other, with a 
verbal description of the principle. that 
should be satisfied in a series. Sometimes 
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the series follows the principle exactly, 
sometimes not. It is probably significant 
that Correct Letter Orders has a signifi- 
cant loading on CMU whereas Correct 
Number Series does not. It matters more 
whether verbal terms are correctly under- 
stood in the letter test than in the number 
test. In the latter the rule can be more 
simply and precisely stated. 

Symbol Reasoning would seem to be an 
ideal type of test for ESI, for it requires E 
to decide whether conclusions, expressed in 
symbolic form, can or cannot be justifiably 
drawn from other symbolie statements in 
the form of equations or inequalities. In 
essence, this test would seem parallel to 
the verbal-syllogistic test, Logical Reason- 
ing, which had been found to measure fac- 
tor EMI. It should be particularly inter- 
esting to learn why the symbolic form of 
the test was loaded instead on a systems 
factor. 

The verbal test, Logical Reasoning, 
presents syllogisms, the relations or impli- 
cations of which are contained in the 
premises and can be extracted by stand- 
ardized tactics or rules of logic. The 
method of responding to the symbolic test, 
Symbol Reasoning, however, involves the 
estimation of an ordered series containing 
each element of the premises so that other 
relationships between values can be judged. 
It appears from this analysis that the con- 
struction (estimation) of a vaguely or- 
dered number series is the process that 
discriminated most Hs, and that is why the 
major variance of this test is shared with 
the ESS tests. 

EST—Evaluation of symbolic trans- 
formation. 


16. Jumbled Words (EST) 48 
(.35 CSR) 
10. Decoding (EST) 37 
22. Number Grouping (DSC) 34 
(.31 CSC) 
39. Typing Errors (EST) 30 
46. Word Relations (CSR) 30 
(.31 CSR) 


The three tests designed to measure EST 
emerged on the EST factor with unexpected 
clarity. The construction of the EST tests 


had proved to be most difficult and there- 
fore it was expected that EST might not be 
found in this analysis. 

All three tests involve the use of words, 
but they differ somewhat in terms of oper- 
ations that E probably performs as he 
takes the tests. In Jumbled Words, he de- 
cides whether each response word could 
have come from the given word merely by 
rearrangement of the letters it contains, 
The criterion is in terms of a certain invari- 
ance or of element identity under the 
transformation of changed order. 

In Decoding, E is expected to apply cer- 
tain rules, of which five are given, in coding 
the letters of each of two words into a 
sequence of digits. The transformation in 
each case is in the form of substitution of 
elements according to rules. The E then is 
expected to decide which coding (trans- 
formation) result could be most easily and 
correctly decoded. The difference in ease 
and correctness of decoding depends upon 
the approach of the substitutions for a 
word to univocality, under the rules. 

In Typing Errors, E is presented with 
what he is told is a misspelled word. From 
a knowledge of the arrangement of let- 
ters on the keyboard of a typewriter, which 
transformation in spelling (substitution of 
one letter element) has most probably oc- 
curred? 

To summarize these comparisons, the 
transformation is in the form of reorder- 
ing of letters in one test and in the form 
of letter substitutions and letter-digit sub- 
stitutions in the other two tests. The cri- 
teria appear to be identity of elements in 
spite of transformation in one test, uni- 
vocality of reference in the coding test, and 
nearness of position in symbolic system 
(keyboard) in the third. These differences 
suggest that there is some scope in kinds of 
transformations and in criteria for decision 
involved in connection with factor EST. 

The two tests not hypothesized for EST, 
Number Grouping and Word Relations, 
share variance with EST tests possibly 
through common CST variance. CST has 
not been reported to be isolated as a factor, 
but the recognition that a change has 
occurred or the ability to see the changes 
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may be common to all five tests on this fac- 
tor. The factor might therefore be a con- 


founding of EST with CST. 
ESI—Evaluation of symbolie impli- 
cations: 
2. Best Letter Set (ESS) .58 
1. Abbreviations (ESI) AT 
17. Letter Problems (ESI) 38 


The ESI factor is defined by two tests 
designed as measures of ESI, but is led by 
a test designed for ESS that is not loaded 
on that factor. The most obvious common 
characteristic of the two leading tests is 
that the given stimuli are sets of letters, 
but this is also true of many other tests in 
the battery. But in addition, the item for- 
mats of the two tests are similar. In Ab- 
breviations, a sequence of three or four 
letters is presented as the potential abbre- 
viation of three alternative, familiar words, 
with E to say for which word the abbrevia- 
tion best stands or that it best implies. In 
Best Letter Set, a sequence of three of four 
letters is given and three alternative letter 
sets of the same length, E to select the one 
that is most similar to the given set by 
virtue of common properties. The test was 
expected to be a measure of ESS because 
it was thought that the nearness of one set 
to another in terms of their constitutions 
would be a matter of comparing systems 
for approach to identity in terms of sys- 
temic properties. 

It is not very clear how Best Letter Set 
becomes an implications test rather than a 
systems test. It is little more than stating 
the obvious to say that the given set ap- 
parently implies the best alternative, simi- 
larly to the way in which a letter set in 
Abbreviations implies a longer letter set 
in the form of a real word. A revised for- 
mat of Best Letter Set, giving alternatives 
that either do or do not fit the principle of 
the given set exactly, might have been a 
better ESS test. But there would still be 
much unanswered about underlying reasons 
for the difference in factor content of the 
two test formats. Y 

Letter Problems holds some promise of 
univocality, so far as this analysis goes, 
but with a low loading of .38 and a 


reliability of .88, there is considerable 
room for additional common-factor con- 
tent. Letter Problems is interesting for the 
fact that is was developed by analogy to 
Form Reasoning, which is a marker test 
for factor NSI. A major difference, which 
should not be factorially significant, is 
that Letter Problems uses letters as sym- 
bolic elements whereas Form Reasoning 
uses geometric forms as symbolic elements. 
The significant difference is that Form 
Reasoning calls for inferences or conclu- 
sions to equations of a certain type whereas 
Letter Problems asks for decisions as to 
whether a problem equation is solvable, or 
is solvable with a minor change in the 
problem. It might be said that the cri- 
terion for evaluation is solvability or the 
possibility of valid inferences or impli- 
cations. This is not strictly a matter of 
judging the value or identity of an impli- 
cation, as such, or its similarity to an- 
other implication, which are common 
kinds of criteria in tests of other evalua- 
tion factors. But if a new kind of criterion 
is involved and is crucial to the loading on 
ESI, we have a little extension of con- 
notation of evaluation abilities and the 
evaluative process as envisaged from the 
psychometric approach. 


Interpretation of the Semantic Reference 
Factors 
CMU—Cognition of semantic units: 


51. CAT Reading 
Vocabulary (CMU) 70 
(47 CMS) 
86. Verbal Comprehension (CMU) .67 
87. Word Completion (CMU) 63 
90. Word Substitution (EMU) 60 
77. Story Titles (EMT) 43 


52. CAT Arithmetic 
Reasoning (CMS) .40 
(.57 CMS) 
39 


Synonyms (EMU) 


(.35 EMI) 

64. Matched Verbal 
Relations (EMR) 32 
(.50 EMR; .34 EMI) 
57. Class Name Selection (EMO) .31 
(.50 EMO) 
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In factor analyses of semantic-test bat- 
teries, it is not surprising to find a number 
of tests loaded on the CMU factor. Word 
knowledge is the essence of factor CMU— 
the well-known factor of verbal comprehen- 
sion. The achievement tests appear to be 
factorially complex, spreading their vari- 
ances over CMU and CMS, the factors 
most important for academic performance. 

The Verbal Comprehension test used in 
this study was so constructed as to elimi- 
nate some of the items of standard verbal 
comprehension tests that would appear pos- 
sibly to introduce evaluative variance into 
the scores. In Word Completion, E is to 
write synonyms or short definitions for 
given words, The test was developed to 
test the hypothesis that a completion form 
of vocabulary test would yield a more pure 
measure of CMU and that a multiple- 
choice form may have some secondary 
evaluation (EMU) variance. The correla- 
tion between Verbal Comprehension and 
Word Completion is .68, and both tests are 
found to be strong univocal measures of 
CMU, with the multiple-choice form 
having more CMU variance. 

Word Substitution and Synonyms were 
both designed for the EMU factor. In 
Word Substitution, E is to select a word 
that best fits into a given sentence as a 
substitute for an underlined word. The al- 
ternative choices are all about equally ac- 
ceptable, so that E is required to exercise 
evaluative thinking to find the best answer. 
Synonyms is also different from traditional 
vocabulary tests in that the choices are 
about equally acceptable and the difficulty 
of the words in each item is at a low level in 
order to reduce CMU variance in the test 
scores. In spite of the attempt to mini- 
mize cognitive variance and to maximize 
evaluative variance, both tests have sig- 
nificant loadings only on CMU, indicating 
that the evaluative process cannot be em- 
phasized over the cognitive process in the 
case of semantic units merely by increasing 
the competition among alternative choices. 

The outcome with respect to Word Sub- 
stitution and Synonyms answers an im- 
portant question regarding what kind of 
test measures factor CMU. Most vocabu- 


lary tests demand only that the examinees 
show acquaintance with words or some fa- 
miliarity with them. There is usually no 
demand for very penetrating or discrimi- 
nating knowledge of words. The presence of 
the two EMU-designed tests in the CMU 
lists indicates that ability to make fine dis- 
criminations among more familiar words 
is also a matter of factor CMU. 

The finding that CMU is also indicated 
by tests calling for fine discriminations be- 
tween word meanings could also account 
for the appearance of some CMU variance 
in other verbal tests where the vocabulary 
level has been kept low with the objective 
of reducing the CMU variance. Although 
the vocabulary level is low, some moder- 
ate or high-level discriminations on the 
basis of connotative aspects of word mean- 
ings may be involved, such as in Story 
Titles, Matched Verbal Relations, and 
Class Name Selection. 

CMC—Cognition of semantic classes: 


85. Verbal Classification— 


Part II (CMC) i 
84. Verbal Classification— 

Part I (CMC) 57 
56. Best Word Pairs (EMC) 34 
88. Word Extension (EMI) 32 


(.39 EMI; .37 EMR) 


The CMC factor appears to be well de- 
fined in this study. Since variables 85 and 
84 are two parts of the same reference test, 
the factor loadings of these variables are 
likely to be overestimated, being some- 
what inflated by variance specific to the 
test Verbal Classification. 

As in the symbolic study, the evalua- 
tive classes test that did not specify the 
criteria for evaluating class membership is 
loaded on the cognition factor. This be- 
havior of analogous tests shows strikingly 
the necessity for specified criteria for eval- 
uative processes to occur. 

CMR—Cognition of semantic relations: 


89. Word Linkage (EMR) .60 
82. Verbal Analogies I— 
Part II (CMR) 43 


(30 EMR) 
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Although Word Linkage was developed 
to measure the EMR factor, it was found 
to be the leading variable on what appears 
to be CMR. In each item of Word Linkage, 
E is to choose a word that is related to two 
other given words in different ways. The 
present finding seems to indicate that the 
task is heavily dependent upon cognitive 
abilities because the important difficulty 
lies in seeing the two different relation- 
ships, and this depends also upon E's having 
rich meanings for the words involved, 
hence the CMR variance. 

Verbal Analogies I, as administered in 
this study, has three equal, separately 
timed parts. For the process of factor ana- 
lysis, two part scores were wanted, so vari- 
able 82 is a combination of the second and 
the third parts, and variable 81 is the first 
part. Variable 82 contributes consistently 
to the CMR factor as expected, but varia- 
ble 81 is loaded significantly on the EMR 
factor. The reason for this difference in be- 
havior of the parts of Verbal Analogies I 
is not obvious. The CMR factor is not as 
sharply confirmed as was expected. The 
tests that are loaded on this factor, how- 
ever, suggest that the factor is similar to 
the CMR factor isolated in the past. 

CMS—Cognition of semantic systems: 


52. CAT Arithmetic 


Reasoning (CMS) 57 
(.40 CMU) 
74. Ship Destination Test (CMS) — .50 
(.33 EMR) 

51. CAT Reading 
Vocabulary (CMU) AT 
(.70 CMU) 


This is the well-known factor of general 
reasoning (Kettner et al., 1956). Ship Des- 
tination Test and Arithmetic Reasoning 
have been the two leading tests defining 
this factor. The heavy involvement of one 
measure of academic achievement in the 
leading test probably accounts for the 
“pulling” of variance from Reading Vo- 
cabulary into CMS. When complex stand- 
ardized tests have been employed in factor 
analyses, they have tended to be complex. 

CMT—Cognition of semantic trans- 
formations: 


45 
53 


76. Similarities—Part II (CMT) 
75. Similarities—Part I (CMT) 


The two parts of Similarities clearly 
define this reference factor. Owing to the 
fact that these two variables are two parts 
of the same test, their loadings are proba- 
bly overestimated. 

CMI—Cognition of semantic implica- 
tions: 


65. Pertinent Questions— 


Part I (CMI) 45 
66. Pertinent Questions— 

Part II (CMI) .68 
71. Seeing Problems (EMI) 43 
53. Apparatus Test (EMI) 34 


The CMI factor was originally called 
“conceptual foresight” (Berger et al, 
1957). Two parts of Pertinent Questions 
are the leading variables defining this 
factor in the list above. Again, there is 
inflation of loadings from the specific-vari- 
ance source. 

In previous studies, Seeing Problems and 
Apparatus Test, one or both, commonly 
helped to define a factor called “sensi- 
tivity to problems,” which was defined as 
the “ability to see defects, needs, and de- 
ficiencies,” and was considered to belong to 
the category of evaluation. In this study 
these two tests were loaded significantly 
together on factor CMI. Seeing Problems 
and Apparatus Test have frequently had 
variances from cognitive factors, some- 
times from factor CMI and sometimes 
CMT. Whether the other tests that have 
helped to determine the sensitivity-to- 
problems factor in the past—Social Insti- 
tutions and Seeing Deficiencies—will also 
be found consistently related to either CMI 
or CMT, or both, is yet to be determined. 

DMI—Divergent production of seman- 
tie implications: 


68. Possible Jobs—Part I (DMI) .82 
69. Possible Jobs—Part II (DMI) .56 


The DMI factor has been isolated a 
number of times in previous studies. Since 
variables 68 and 69 are two parts of the 
same test, their factor loadings are proba- 
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bly overestimated on this factor. No eval- 
uation tests appear to be related to it. 
Whether evaluation factors in general are 
as easily differentiated from parallel di- 
vergent-production factors remains to be 
seen, There had not been much concern 
about such discriminations, as indicated 
by the scarcity of other divergent-produc- 
tion reference factors in the study. The 
exception to this lack of concern was some 
possible confusion between DMI and EMI 
(sensitivity to problems) as seen in one 
study (Guilford, Merrifield, & Cox, 1961), 
where some tests shared variances from 
the two factors. In general either diver- 
gent- or convergent-production tests re- 
quire the producing of responses to fit given 
data in various ways, whereas in evalua- 
tion tests responses are presented to E for 
evaluation purposes. 

NMT—Convergent production of se- 
mantic transformations: 


- 67, Picture Gestalt (NMT) 37 


Picture Gestalt, a test designed for the 
NMT factor, has identified a factor, pre- 
sumably NMT, in agreement with a pre- 
vious study (Wilson et al., 1954). Picture 
Gestalt had been put into the battery to 
help bring out NMT and to see whether 
judgment tests would show any relations 
with NMT, as one judgment test had in 
the past. They did not show that rela- 
tionship in this study. 


Interpretation of the Semantic Evaluation 
Factors 


EMU-—-Evaluation of semantic units: 


61. Double Descriptions (EMU) 64 
73. Sentensense (EMS) 43 
70. Product Choice (EMT) 39 


The EMU factor was hypothesized as an 
ability to evaluate the suitability or ade- 
quacy of a word or object in terms of meet- 
ing given criteria. In each item of Double 
Descriptions, E is to evaluate the extent to 
which objects possess two criterion prop- 
erties. The nature of the task in this test 
seems quite congruent with the conception 
of the definition of EMU. 


Two other tests designed for factor EMU 
(Word Substitution and Synonyms) are 
missing from the list of tests above. In- 
stead, they were loaded substantially on 
CMU, the cognition of semantic units. It 
is not surprising that essentially modified 
vocabulary tests should have some load- 
ing on CMU, but it is surprising that 
with the words of a fairly high level of 
familiarity and with the alternative answers 
designed to emphasize uncertainty of choice, 
there was not at least significant evaluative 
variance. 

Sentensense was originally designed for 
factor EMS, since it deals with the internal 
consistency of ideas or events expressed in 
a sentence, which is defensible as one of 
the forms of semantic system. The test 
was developed by analogy to Unusual De- 
tails, which led in defining what was be- 
lieved to be factor EMS previously. In 
both, inconsistences are to be noted. The 
significant loading of Sentensense on EMU 
may mean one of two things. One hypothe- 
sis is that the soundness of ideas or events 
expressed in a sentence was evaluated tak- 
ing each sentence as an integrated whole 
or unit rather than as a system of inter- 
related parts. In other words, the sentences 
were possibly treated as semantic units in 
the psychological processing of informa- 
tion. The shortness of the sentences in 
Sentensense and the relatively high ability 
level of the Es in this study might have 
made this possible. But each sentence con- 
tains two ideas or events, which suggests 
another hypothesis. The unit may have 
been each component of the sentence 
rather than the entire sentence. Decisions 
were made regarding the compatibility of 
the pairs of ideas or events as pairs of 
units. In the case of either hypothesis, an 
inference might be that more complex sen- 
tences might have shifted the test where it 
was expected, to the EMS factor. 

Product Choice was designed for factor 
EMT. In each item, from a set of alterna- 
tives E is to select an object or objects 
that can be used most adequately for a 
specified purpose. In order to find the right 
answers, E must evaluate how adequately 
each object is used in an unconventional 
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way. The significant loadings of this test 
on EMU, however, indieate that the test 
has little to do with the psychological 
phenomenon of iransformations. After all, 
it is the objects that are to be evaluated, 
not the transformations as such. Such items 
of information are units. 
EMC-—Evaluation of semantic classes: 


55. Best Word Class (EMC) 54 
57. Class Name Selection (EMC) -50 


(.31 CMU) 

58. Commonsense 
Judgment I (EMT) 83 
(.33 EMI) 


62. Important Facts (EMS) 38 


Two of the three tests developed for 
EMC, Best Word Class and Class Name 
Selection, are the two leading tests defining 
the factor. Best Word Class deals with the 
evaluation of class names offered to repre- 
sent given single objects. Class Name Selec- 
tion deals with the evaluation of names 
given to represent sets of objects. 

Best Word Pairs, a test developed ac- 
cording to the EMC hypothesis, con- 
tributed instead to factor CMC. The E is 
asked to evaluate pairs of words in each 
item, saying which pair makes the best 
class. Since the nature of the class is not 
given, E has to discover the nature of each 
class for himself, which could account for 
the cognition variance in factor CMC. Also 
probably unfavorable for determining eval- 
uation variance in the test is the fact that 
the criterion "best was not defined suffi- 
ciently precisely. 

Commonsense Judgment I was predicted 
for the EMI factor, but it shares its vari- 
ance equally with EMC. Commonsense 
Judgment I has to do with the evaluation 
of possible faults in a proposed plan. Tt is 
difficult to see how genuine classes are 1n- 
volved in this test. 

Important Facts was designed for the 
EMS factor based upon the hypothesis 
that a problem situation is a kind of 
semantic system. The hypothesis was de- 
rived from several previously analyzed 
tests for the CMS factor, including Ship 
Destination Test and Arithmetic Reason- 
ing, both of which require Æ to structure 


problems. Important Facts requires E to 
judge the relative importance of a set of 
given facts in connection with solving sim- 
ple problems. However, Important Facts 
was found loaded on EMC and not on 
EMS, suggesting that what is evaluated in 
the test is something with class properties. 
The involvement of classes in this test is 
not obvious. 

EMR-—Evaluation of semantic relations. 


83. Verbal Analogies III (EMR) — .59 
64. Matched Verbal 
Relations (EMR) .50 
(.34 EMI; .32 CMU) 
81. Verbal Analogies I— 


Part I (CMR) 50 
54, Best Trend Name (EMR) A8 
88. Word Extension (EMI) 37 


(.39 EMI; .32 CMC) 


Ti. Story Titles (EMT) .36 
(48 CMU) 
74. Ship Destination Test (EMS) .338 
(.50 CMS) 
63. Logical Reasoning (EMR) 82 
(40 EMI) 

82. Verbal Analogies I— 
Part II (CMR) .30 
(48 CMR) 


From the structure-of-intellect model, 
the EMR factor was hypothesized as the 
ability to evaluate relations between 
words or ideas. In accordance with the 
hypothesis, four tests were newly devel- 
oped or adapted from existing tests. All 
four are represented in the list above. 
Verbal Analogies III and Matched Verbal 
Relations were adapted from the usual 
tests of verbal analogies, which emphasize 
the cognition of relationships between 
given pairs of words. In tests of EMR, 
however, the relations in the first pairs of 
words are made obvious, in order to mini- 
mize the variance due to E's ability to cog- 
nize the relationships. On the other hand, 
the choices of alternative completions are 
made difficult, in order to maximize the 
evaluative variance by requiring E to com- 
pare the given alternatives in the light of 
the standard relations specified in the first 
pairs. The fact that Verbal Analogies III 
has a significant loading on EMR and has 
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no side loading illustrates the possible 
Separation of evaluative ability from cog- 
nitive ability by imposing a difficult opera- 
tion at the appropriate phase of the verbal- 
analogies task. 

The well-known Logieal Reasoning test 
was ineluded in this study to provide con- 
tinuity with the factor of logical evalua- 
tion identified in the previous studies. The 
present findings suggest that the logical- 
evaluation factor is not the same construct 
as the EMR factor in this study. 

Part I of Verbal Analogies I, a marker 
test for the CMR factor, has significant 
loadings on the EMR factor. It appears 
that Part I of this test is not a measure of 
the CMR factor, but has significant evalu- 
ative variance. Part II of Verbal Analogies 
T restricts itself more to factor CMR, how- 
ever. 

The small but significant EMR loadings 
of Word Extension and Story Titles prob- 
ably result from the difficult relationships 
between the given items and the alterna- 
tives. Ship Destination Test has a long 
history of being a pure measure of general 
reasoning, and no explanation of its EMR 
loading is apparent. 

EMS— Evaluation of semantic systems: 


91. Word Systems (EMS) 
80. Unlikely Things (EMS) 


57 
.55 


Word Systems and Unlikely Things are 
univocal tests for this factor. In Word 
Systems an item is composed of three 
matriees with approximate meaningful se- 
quences of words in the columns and rows 
of each matrix to be judged for the best 
ordered system. Unlikely Things was 
adapted from a test called Unusual Details 
which asks E to find unusual items of in- 
formation in sketches of common situa- 
tions. The EMS factor identified in this 
study is similar to the factor called “ex- 
periential evaluation” defined by Unusual 
Details by Hertzka et al. (1954), but the 
conception of its nature is broadened con- 
siderably by the relation to the test, Word 
Systems. 

Two other tests designed for EMS, In- 
portant Facts and Sentensense, have been 
mentioned above. In retrospect, the present 


knowledge concerning the CMS and NMS 
factors have led to a number of hypotheses 
concerning the nature of the EMS factor, 
The present findings lead to reservations 
concerning the possible approaches based 
upon (a) evaluation of internal con- 
sistency of a two-idea sentence, and (b) 
evaluation of importance of facts needed 
to solve a given problem. It may be that 
in a test for EMS, the entire conceived 
problem should be the object of evalua- 
tion, not a particular detail of the prob- 
lem, or that internal consistency of the 
stated facts should be emphasized in such 
a test. 

EMT—Evaluation of semantic trans- 
formations: 


80. Useful Changes (EMT) 55 


Although only a singlet, Useful Changes 
was rotated so that its unique variance was 
orthogonal to the rest of the factor matrix. 
This is the weakest evaluation factor in 
this study, but it was felt that the weak 
factor could offer guidelines about EMT 
test construction and also shed some light 
on the nature of the factor. 

Product Choice, a test designed for 
EMT, was loaded instead on EMU. In the 
interpretation of EMU, it was suggested 
that the objects presented as alternative 
choices in this test were the products of- 
fered for evaluation rather than the trans- 
formations, as such. 

In the area of divergent production, the 
tests of transformations require E to pro- 
duce numerous transformed ideas. On the 
other hand, the tests of transformations in 
convergent production require Æ to produce 
single transformed ideas in order to attain 
specified goals. Two alternative approaches 
in developing the tests for evaluation of 
transformations were considered. The first 
approach is represented by Product Choice 
and Useful Changes, in which the task is 
to evaluate the objects to be transformed 
to attain a specified goal. The second 
approach is to require Æ to evaluate the re- 
sults of transformations or the trans- 
formed information in the light of unique- 
ness or adequacy as the criterion. 

Story Titles, another test designed for 
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EMT, is based upon the second approach. 
The test requires Æ to choose the best titles 
that give new interpretations for short 
stories. The analyses indicate that Story 
Titles failed to identify the EMT factor; 
instead, the test shared substantial vari- 
ance with CMU. 

To understand why Useful Changes, a 
first-approach EMT test, is loaded on 
EMT while Product Choice is not, one 
must look closely at how Æ processes the 
test information in responding. In Prod- 
uct Choice E is given two objects and is 
asked to decide what product from among 
the alternatives could be best made. Each 
product-alternative is then evaluated ac- 
cording to the limitations inherent in the 
two given objects—a process much like 
that of Double Descriptions. 

Useful Changes is a reversal; E is given 
a task to perform and is to decide which 
of three given objects could best be trans- 
formed to perform the task. In other words, 
E must transform (or attempt to trans- 
form) each object and then judge which 
transformed object would perform the job 
most adequately. It is the transformation 
that is evaluated, not the object. Future 
tests for EMT should demand Z's actually 
making or recognizing simple transforma- 
tions, and then judging them according to 
some criteria. 

EMI—Evaluation of semantic implica- 
tions: 


60. Complete Thoughts (EMS) 55 
72. Sentence Selection (EMI) 54 
63. Logical Reasoning (EMR) 40 

(.32 EMR) 
88. Word Extensions (EMI) 39 


(.37 EMR; .32 CMC) 


78. Synonyms (EMU) 35 
(.39 CMU) 

64. Matched Verbal 
Relations (EMR) 34 


(.50 EMR; .39 CMU) 
58. Commonsense 


Judgment I (EMI) 33 
(.33 EMO) 


Complete Thoughts, a test designed for 
factor EMS, is the leading test defining 
EMI. The test calls for decisions as to 


whether statements are complete or incom- 
plete sentences, that is, whether implica- 
tions generated by the beginnings of sen- 
tences are fulfilled in the remaining words. 
It would appear that the alternative view- 
point of the implication-fulfillment aspect 
of sentences, completeness, does not apply 
to the evaluation of systems, for which 
Complete Thoughts was designed. 

The second strong EMI test, Sentence 
Selection, requires E to select the statement 
that is most probably true, in view of the 
given information, a single premise, The 
test has high face validity in reference to 
the adopted conception of EMI, which is 
hypothesized as an ability to evaluate 
extrapolated information in the form of 
expectancies, predictions, antecedents, con- 
comitants, or consequences. 

The well-known test Logical Reasoning, 
a syllogistic test, has its highest loading 
on EMI, but shows some sign of factorial 
complexity, with a significant loading on 
EMR. Apparently, the relational aspects 
of premises and conclusions are not strong 
enough to emphasize the product of rela- 
tions. Conclusions are more like expect- 
ancies, following from the premise. 

Word Extensions, a test designed for 
factor EMI, has significant loadings on 
the EMI and EMR factors also. The test 
asks E to select the name of an object or 
attribute that is always implied by a 
given thing. Some of the easy items in this 
test might have been answered by evalu- 
ating the closeness of relationship between 
objects and the given thing, hence the rela- 
tion of the test to EMR. More obvious or 
more meaningful implications may be psy- 
chologically regarded as relations. With- 
out further evidence concerning each item 
of this test and its relation to EMI or 
EMR, it is best concluded that Word Ex- 
tensions has loadings on both the EMI and 
EMR factors. 

The fact that factors EMR and EMI 
share three tests in common in this analy- 
sis might suggest some obliqueness for 
these two factors. But it should be noted 
that each of these factors has at least three 
other tests not shared significantly by the 
other. The hypothesis of orthogonality 
cannot therefore be given up. Factors EMI 
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and CMU also have two tests in common, 
but more than two for each factor that are 
not shared significantly. 


Discussion 


In the two studies reported, 12 factors 
of evaluative abilities were deduced from 
the structure-of-intellect model, and the 
validity of the hypotheses was tested by 
factor analysis. The hypothesized factors 
pertain to the evaluative operation and 
require tests having symbolic and semantic 
content. The hypotheses for the two stud- 
ies, couched in a factor-analytic design, 
would read as follows: 

1. The six symbolic-evaluation abilities 
predicted by the  structure-of-intellect 
model are essentially independent of one 
another and also of other known factors of 
intelligence, particularly those of symbolic 
cognition, 

2. The six semantic-evaluation abilities 
predicted by the  structure-of-intellect 
model are essentially independent of one 
another and also of other known factors of 
intelligence, particularly those of semantic 
cognition. 

Mutual Independence of the Evaluation 
Factors 


_ Of the 12 hypothesized evaluation fac- 
tors, the independent existences of 11 have 
been well supported by the fact that the 
leading variables defining them have few 
signifiant side loadings on other factors 
of evaluation; what secondary loadings 
there are, are usually small and they are 
on cognition factors. 

Factor ESU, defined best by Symbol 
Identities and Letter “U” is clearly the 
symbol evaluation factor which had never 
emerged clearly before, due to inadequate 
numbers of tests for its delineation from 
perceptual speed. ESU tests have some 
small perceptual-speed involvement, but 
this could be due to the speeded natures 
of the tests defining the two factors. ESU 
is defined as representing the ability to 
judge quickly and accurately literal and 
numerical information as conforming or 
not conforming to certain necessary cri- 
teria and is probably equivalent to the 


ability commonly known as clerical speed 
and accuracy. 

Led by Best Number Class and Sign 
Changes II, both somewhat complex, ESC 
appears to represent the ability to judge 
the goodness of class membership of sym- 
bolic information and the ability to be 
sensitive to and judge the class properties 
inherent in symbolic expressions. The es- 
sential nature of this factor lies in evalu- 
ating the “tightness” of the concept em- 
bracing the given symbolic information. 
Such an evaluation is necessary for judging 
concept names and for judging class mem- 
bership. This ability has been found to be 
involved to a significant degree in success 
in high-school mathematies (Guilford et 
al., 1965). 

The ESR tests defined a factor involving 
the judgment of relationships among sym- 
bols on the basis of similarity and con- 
sistency. The leading test, Symbol Manip- 
ulation, calls for judgments regarding the 
truth or falsity of immediate consequences 
of symbolic propositions. The immediacy 
of the consequences is the important aspect 
for the relational nature of this test. 

Somewhat broader than a numerical es- 
timation factor, ESS reflects the ability to 
estimate values of symbols within a 
vaguely ordered system. The leading tests, 
Way-Out Numbers and Series Relations, 
support this interpretation for numerical 
series, but other tests define the factor 
more broadly to include sensitivity to er- 
rors within the symbolic systems, thus, 
that nonoperational ability, sometimes 
referred to as “number sense” or “approxi- 
mation ability,” finds a place in the struc- 
ture of intellect as a unique, measurable 
factor of symbolic evaluation. 

The EST factor isolated was also broad 
in meaning, encompassing sensitivity to 
symbolic substitutions and reorderings. It 
is probable that tests of a more crypto- 
graphic nature, perhaps based upon univer- 
sal characteristics of codes (letter place- 
ments and frequencies) will function as 
stronger and purer measures of EST. If 
this is the case, another high-level job ele- 
ment will have been accounted for in the 
theoretical model of human intelligence. 


-€——————— rolugn 
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The clear factorial delineation of ESI is 
aided (and complicated) by what is prob- 
ably common format variance. The connec- 
tions to be evaluated in Best Letter Set, 
Abbreviations, and Letter Problems are 
more remote than those evaluated in the 
relations tests. Whether the remote nature 
of the connections to be evaluated is neces- 
sary for the product of implications must 
await further investigation. It is an inter- 
esting conjecture that evaluation of impli- 
cations is a rather probabilistic decision- 
making process. 

EMU is clearly defined as the ability in- 
volved in judging the suitability or ade- 
quacy of ideas and objects in terms of 
meeting certain criteria. In all three tests 
loading on EMU, Double Descriptions, 
Sentensense, and Product Choice, the cri- 
teria are given and the ideas or objects are 
evaluated in terms of how they meet those 
criteria. 

Best Word Class and Class Name Selec- 
tion identify factor EMC as being almost 
exactly parallel to the symbolic factor, 
ESC. The criteria for judgment is the 
“tightness” of the concept or class and the 
things to be judged are either members of 
the class or concept names. 

The EMR factor, as represented by 
Verbal Analogies II and Matched Verbal 
Relations, is also closely parallel to the 
ESR factor and is concerned with judg- 
ments regarding the similarity or con- 
sistency of relationships between words or 
ideas. Like ESR, the EMR factor also 
embraces relationships of a consequences 
type, when the connections are very imme- 
diate. In distinguishing relations and im- 
plications, we may consider the product of 
implications as a broad category of con- 
nections between ideas, some of which, 
when specifiable because they have unique 
characteristics, are processed as relations. 

The factor of logical evaluation has been 
defined as an ability involving sensitivity 
to consistency of logical relationships and 
has been consistently defined by tests of 
decisions about the correctness of conclu- 
sions drawn from premises. The factor was 
identified with EMR in the SI model, with 
the thought that each statement, premise, 


or conclusion expresses a relationship of 
some kind. But from another point of 
view, conclusions are inferences, and in- 
ferences are implications, and one could 
say that in tests such as Logical Reason- 
ing, implications are being judged. This 
line of thinking led to doubts about identi- 
fying logical evaluation with EMR rather 
than EMI. The results are rather decisive 
that other kinds of tests, especially tailored 
for EMR, determine a factor that has a 
better claim to that spot in the SI model. 

The EMS factor as defined by Word 
Systems and Unlikely Things is probably 
broader than what can be confidently con- 
cluded from the nature of its two tests. In 
any case, it is much broader than the old 
factor of “experiential evaluation,” but 
probably subsumes the old concept, in 
part. The essential nature of this factor is 
concerned with judgments about or within 
complexes of meaningful information, 
where the complex is a necessary considera- 
tion for judgment. Where the stimuli are 
not complex, but can be processed as simple 
information, as in Sentensense, the evalua- 
tion is of a unit, not a system. 

This study resulted in revealing only a 
trace of the factor EMT, defined by Useful 
Changes. Future tests for this factor must 
employ transformation as the information 
to be evaluated, not transformed objects. 
This principle is difficult to follow using 
paper-and-pencil tests, but Useful Changes 
should serve as a prototype of the tests that 
might fruitfully be tried. 

The EMI factor may be regarded as a 
new factor, represented best by Complete 
Thoughts and Sentence Selection. It is defi- 
nitely not the former sensitivity-to-prob- 
lems factor, whose tests now appear to be 
cognitive and to belong with factor CMI. 
Nor is it clearly the former "logical evalua- 
tion,” defined mainly by the test Logical 
Reasoning, and other tests of similar na- 
ture, although there is something in com- 
mon to the two factors, with Logical Rea- 
soning furnishing a link. It should be noted 
that the factorial behavior of Logical Rea- 
soning is very similar to that of its parallel 
symbolic test, Symbol Reasoning. The 
probabilistic nature of the ESI factor, 
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however, does not appear well founded as 
an interpretation for EMI, where the impli- 
cations can only be defined as being rela- 
tively remote. 

The cell for EMI in the SI model was 
previously assigned to the factor called 
sensitivity to problems. 'Two reasons aris- 
ing from this study call for rescinding that 
decision. One, just mentioned, is the finding 
of a more suitable factor for that spot. The 
other is that the tests used as markers for 
the sensitivity-to-problems factor accom- 
modatingly went together elsewhere, join- 
ing with the marker tests for CMI, the 
cognition of semantie implications. The 
change from EMI to CMI is a change in 
operation only. Sensing problems and seeing 
implications are compatible ideas, at least. 


Independence of the Evaluation Factors 
from the Reference Factors 


All of the five kinds of operations pre- 
dicted by the structure-of-intellect model 
were involved in these studies, but obvi- 
ously not equally so. Outside the area of 
evaluation under primary consideration, 
there were 11 different cognition factors 
represented by marker tests, 1 memory 
factor, 2 divergent-production factors, 4 
convergent-production factors, and 1 figu- 
ral-evaluation factor. The reference fac- 
tors were included in the analysis with the 
usual concern lest some of the new evalua- 
tion factors not be shown to be distinct 
from parallel factors in other operation 
categories or lest some of the variances of 
new experimental tests not be well ac- 
counted for. 

The greatest. concern was regarding the 
demonstration that the evaluation abilities 
be differentiated from the parallel cognition 
factors. In constructing evaluation tests, it 
is not easy to be sure that cognitive vari- 
ance has been ruled out by the experimental 
controls in the test conditions, or even that 
cognitive variance may not dominate the 
test. Other reference factors were brought 
in because new experimental tests were 
suspected of involving operation vari- 
ances other than evaluation. 

Evaluation was defined earlier as the 
process of reaching decisions or making 
judgments concerning the goodness of in- 


formation in terms of specified criteria. It 
was thought that one unique characteristic 
of good evaluation tests, as distinguished 
from cognition tests, would be the condi- 
tion of uncertainty at the point of the re- 
sponse to alternative choices rather than at 
the point of knowing the specifications of 
the criteria. In developing the tests of eval- 
uation, attempts were made (a) to specify 
the criterion for evaluation as clearly as 
possible, (b) to induce the uncertain condi- 
tion by making alternative choices about 
equally “good” (the “uncertainty princi- 
ple"), and (c) to keep the difficulty of 
words in the items at low levels. Although 
these principles appeared to work in some 
instances, in the measurement of evalua- 
tion, there are several instances in which 
tests designed for evaluative factors, follow- 
ing the principle of maximizing uncer- 
tainty, are loaded on factors of cognition. 
Careful observation of these tests might 
throw light upon the nature of the evalua- 
tive processes. It, might also serve to point 
out some of the problems in test develop- 
ment. 

Word Substitution and Synonyms were 
both designed for the EMU factor, empha- 
sizing the uncertainty principle. In Word 
Substitution, E is asked to select a word 
that best fits into a given sentence in 
place of an underlined word. Synonyms is 
similar to the usual multiple-choice vo- 
eabulary tests except that the alternative 
choices are all about equally acceptable 
and that the difficulty of words, as such, is 
kept at a low level. In spite of the attempt 
to maximize evaluative variance and mini- 
mize cognitive variance in this manner, the 
analysis indicates that both tests are essen- 
tially CMU tests. The competing choices in 
Synonyms and Word Substitution probably 
increased the CMU variance by requiring 
E to exercise finer discrimination among 
the meanings of words. The fact of com- 
peting choices alone does not ensure evalu- 
ative variance. 

One of the important differences between 
these two tests and Double Descriptions, 
the leading test for EMU, is that in the 
latter the criteria for evaluation are ex- 
plicitly stated in the items whereas the cri- 
teria for Word Substitution and Synonyms 
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are roughly stated as the adequacy of a 
word in a given sentence, or closeness of the 
meanings of words. 

In both Word Substitution and Syno- 
nyms, the criteria are not explicitly stated 
in connection with each evaluative act. In 
either case, E has to cognize his own cri- 
terion, which is the exact meaning of the 
given word. Success in this step is crucial 
for success in answering the item. A rea- 
sonable conclusion is that once E has de- 
fined the words precisely, there is not much 
uncertainty as to which match is best. 

A similar factor switch occurred on the 
parallel symbolic factor, ESU. The test 
Correct Spelling does not provide specific 
criteria for reaching decisions other than 
drawing upon memory storage for some- 
thing with which to compare the given 
item of information. Correct Spelling is 
loaded on CSU, the parallel cognition fac- 
tor. 

In other forms of tests, the specifications 
of criteria do not seem to present a serious 
problem. In each item of Verbal Analogies 
III, for example, the standard pair of 
words is given as a criterion followed by 
the first word of the second pair and the 
alternative choices. First, E is to cognize 
the relationship in the standard pair of 
words, a step that is made fairly easy for 
him. The second step is fitting the first 
word of the second pair into the criterion 
relationship discovered in the first pair. 
This step can be identified as the construe- 
tion of a search model. The third step is to 
try out the alternative words to see which 
one best fits the search model. In construc- 
tion of this test, there was an attempt to 
introduce the uncertain state of affairs in 
the third step in order to increase evalua- 
tive variance. In other words, it is where 
the uncertainty is encountered that mat- 
ters for testing purposes, not the fact that 
there is uncertainty. 

In general, the strategy in test develop- 
ment employed in these studies was to de- 
vise tasks in which only the evaluative 
process is difficult in the total problem- 
solving activity. The results seem to sug- 
gest that certain approaches found to be 
promising in one test are not necessarily 
applicable in other tests, and certain ap- 


proaches found to be promising in one 
product category of the structure-of-intel- 
lect model do not necessarily apply in 
other product categories. There is sufficient 
evidence, however, that evaluative abilities 
can be measured separately from cognitive 
abilities by presenting adequate specifica- 
tions of criteria for evaluation and provid- 
ing the appropriate level of uncertainty in 
connection with the response to the alter- 
native choices. 

A second cognition-evaluation problem 
was cleared up in both studies. As the 
semantic sensitivity-to-problems tests, hy- 
pothesized for EMI, were loaded instead on 
CMI, so did the parallel symbolic test. 
It appears that sensitivity to problems is 
similar to foresight—one who plans well is 
one who is sensitive to potential problems. 

A third example occurred with the classes 
tests of both studies. Tests of ESC were 
direct translations of the EMC tests into 
symbolic content. Best Number Pairs was 
an adaptation of Best Word Pairs. The two 
performed very similarly in that both went 
rather on their respective cognition paral- 
lels, factors CSC and CMC. In both cases 
it can be pointed out that for measurement 
of evaluation rather than cognition, there 
is a need to state explicitly the criteria 
upon which judgments are to be made, and 
possibly provide models for comparison or 
to describe them, as in Double Descriptions 
and Letter “U.” 

A more general indication of the overall 
cognition-evaluation confusion is indicated 
by the frequency with which tests designed 
for one operation have significant loadings 
on factors in the other operation category. 
In both studies combined, the hypothe- 
sized cognition tests exhibited 28 loadings 
on cognition factors and 6 loadings on 
evaluation factors, implying that evalua- 
tion does not account for much of the 
variance in cognition tests. Unfortunately, 
the converse is not so strikingly clear. The 
evaluation tests had 22 loadings on cogni- 
tion factors and 48 on evaluation factors, 
indicating that cognition does account for 
some of the variance in evaluation tests. 

It would appear to be easier to keep eval- 
uation out of cognition tests than to keep 
cognition out of evaluation tests, although 
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this depends somewhat upon where the ro- 
tations happen to go in a particular analy- 
sis. At any rate, from another point of view 
the experimental evaluation factors appear 
to be well differentiated from other opera- 
tion factors. Although one might expect 
evaluation to play roles in connection with 
convergent-production tests, in which re- 
sponses must be narrowed down to à single 
one for each item, as far as can be seen 
there is little reason to doubt the distinct- 
ness of those two operation eategories. The 
tests of memory, divergent production, and 
convergent, production exhibited their load- 
ings mainly on their respective factors, 
and the factors were relatively clear of 
evaluation-test loadings. The expectation 
that such confusions would not be great was 
confirmed. 


The Properties of Evaluation 


What have the studies done for further 
elucidation of the concept of the operation 
of evaluation? What aspects of the con- 
cept possibly need changing as a result of 
the new information about factors and 
their tests, and what new features come 
into the picture? The answers to these gen- 
eral questions shall be considered more 
specifically in terms of (a) the kinds of 
judgments that belong in the picture of 
evaluation as defined by the tests; (b) 
what kinds of criteria for judgment are 
pertinent to measurement of the evaluation 
abilities; and (c) whether there are any 
new restrictions to be placed on the con- 
cept. The answers to these issues rest upon 
the kinds of tests that serve to measure 
their evaluation factors well and those that 
A not, when all have been hypothesized to 

o $0. 

Kinds of judgment. Much was said from 
one place to another in this report about 
the two classes of tests: sensitivity versus 
estimation. More fully spelled out, these 
terms mean sensitivity to error or dis- 
crepancy on the one hand and, on the 
other, judgment of relative nearness of a 
number of items of information (for any 
kind of product) to a kind of model item 
of information on the same continuum. In 
more operational terms, the contrast may 
be stated in terms of "absolute" versus 


"relative" kinds of judgments, as in psy- 
chophysies. 

It should be abundantly clear, from the 
way in which both sensitivity and estima- 
tion types of tests are commonly related 
significantly to the same factors, that both 
kinds of judgments apply. If we compare 
evaluation tests from the two categories 
that clearly involve absolute versus relative 
judgments, however, we find that some fac- 
tors tend to be more strongly defined by 
tests of absolute judgments or sensitivity, 
while others tend to be defined by tests of 
relative judgment or estimation. It is possi- 
ble that further research will emphasize 
such between-faetor differences, but at 
present it is safe to assume that each factor 
ean be measured by both types of tests. 

Of the 14 leading univocal tests for the 
12 obtained evaluation factors, 11 were 
tests of relative judgment or estimation and 
3 were of absolute judgment or sensitivity. 
It should be noted, however, that the ma- 
jority of the experimental tests were of the 
former type, and therefore chance alone 
would favor such a result for relative- 
judgment tests. 

Kinds of criteria. The most common 
criteria employed in evaluation tests have 
been identity versus nonidentity (Percep- 
tual Speed) and consistency versus incon- 
sistency (Logical Reasoning). Other kinds 
of criteria have been mentioned in connec- 
tion with various tests and for different 
factors. Some examples are: fitness for class 
membership (Letter “U” and Double De- 
scriptions) ; relative familiarity (Familiar 
Letter Combinations); relative similarity 
(Series Relations and Verbal Analogies 
III); conformity to principles (Correct 
Number Series, Correct Letter Orders, and 
Word Systems); relative probability (De- 
coding and Unlikely Things) ; and solvabil- 
ity of problems (Letter Problems). Such a 
variety of criteria can possibly be brought 
under more abstract and more general cri- 
terion categories, since such terms as *iden- 
tity,” “similarity,” and “conformity” sug- 
gest that the criteria tend to be logical in 
nature and that they represent continua of 
one kind or another. 

The scope of evaluation. The scope of 
processes under the heading of evaluation 
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is indieated somewhat by the variety of 
criteria that may be involved. The dis- 
cussion of this topic above revealed some- 
thing of the apparent variety of criteria 
that is represented in the experimental 
tests. But it was suggested that such cri- 
teria may be limited to the general logical 
category, which would rule out of consider- 
ation criteria involving esthetic and ethical 
values. There is no doubt that such values 
exist and such areas of judgment call for 
evaluative operations. At the present, they 
do not seem to fit into the structure-of-in- 
tellect model. Perhaps they call for two 
complete additional sets of evaluation 
abilities or processes, whether parallel to 
the present theoretical set or not. Another 
possibility is that esthetic judgments may 
be applicable in the decision processes 
concerning figural information, and ethical 
or moral ones concerning behavioral infor- 
mation. Such an extension of the concept of 
evaluation awaits further investigation. 

As to the definition of evaluation itself, 
the kind of restriction just discussed sug- 
gests that it is going too far to say that 
evaluation is a matter of reaching deci- 
sions regarding goal satisfaction. Kinds of 
goals are much too numerous, and satisfac- 
tion in terms of logical criteria cannot cover 
all cases. In defining the restricted kinds of 
evaluation represented in the structure of 
intellect, it would seem desirable to elimi- 
nate reference to “goal satisfaction.” 

As a general impression, from considera- 
tion of the experimental tests and their 
factors in this study, the importance of an 
act of comparison seems to stand out. The 
observation was also made by Hertzka et 
al. (1954) that the core of the definition of 
evaluation is the concept of comparison. 
The following current definition of evalua- 
tion can be suggested: Evaluation is a 
process of comparing information with 
known information according to logical 
criteria, reaching a decision concerning 
criterion satisfaction. 


Recommended Tests for the Evaluation 
Factors 


Although many of the experimental tests 
were disappointing, loading on too many 


factors to be considered univocal, or not 
loading on factors for which they were de- 
signed and in terms of which they could be 
understood, several performed in a manner 
that provides some confidence in recom- 
mending them tentatively as the best avail- 
able measures of their factors. The recom- 
mended tests appeared to be fairly univocal 
and, in other respects, reasonable measures 
of their factors. The recommended meas- 
ures for the experimental evaluation factors 
are: 


ESU Symbol Identities 
Letter “U” 

ESC Best Number Class 
Sign Changes II 

ESR Symbol Manipulation 
Related Words I 

ESS Way-Out Numbers 
Series Relations 

EST Jumbled Words 

ESI Abbreviations 

EMU Double Descriptions 
Sentensense 

EMO Best Word Class 
Class Name Selection 

EMR Verbal Analogies III 
Best Trend Name 

EMS Word Systems 
Unlikely Things 

EMT Useful Changes 

EMI Complete Thoughts 


Sentence Selection 


Summary 


Two studies were designed to test the im- 
plications of, and extend the empirical 
foundations underlying the structure-of- 
intellect model. The studies attempted to 
identify basic traits with respect to which 
individuals differ from one another in 
evaluative performances. The 12 hypothe- 
sized abilities of symbolic and semantic 
evaluation were selected for investigation. 
The major objective was to determine 
whether such distinguishable abilities could 
be demonstrated; distinguishable from one 
another and also from other intellectual 
abilities. A secondary objective was the 
determination of what mental processes 
are evaluative. i 

The 12 hypothesized evaluation factors 
are: six symbolic factors investigated in 
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the first study—evaluation of symbolic 
units (ESU); evaluation of symbolic 
classes (ESC); evaluation of symbolic 
relations (ESR); evaluation of symbolic 
systems (ESS); evaluation of symbolic 
transformations (EST); and evaluation of 
symbolic implications (ESI); and six se- 
mantic factors investigated in the sec- 
ond study—evaluation of semantic units 
(EMU); evaluation of semantic classes 
(EMO); evaluation of semantie relations 
(EMR); evaluation of semantie systems 
(EMS) ; evaluation of semantic transforma- 
tions (EMT) ; and evaluation of semantic 
implications (EMI). 

In order to demonstrate the distinctness 
of the hypothesized factors from those al- 
ready known, 19 reference factors, previ- 
ously confirmed, were analyzed as experi- 
mental controls. The reference factors for 
the symbolic study included all of the 
known cognition and convergent-produc- 
tion factors concerned with symbolic infor- 
mation, verbal comprehension, numerical 
facility, a symbolic flexibility factor, and 
perceptual speed. In the semantic study, 
the reference factors included all the known 
cognition factors concerned with the seman- 
tic information, also semantic elaboration, 
and redefinition. From the list of reference 
factors it can be seen that special attention 
was paid to differentiating the evaluation 
factors from parallel factors of cognition. 

Two different 8-hour test batteries were 
constructed to accomplish the experimental 
objectives. A total of 50 tests (25 evalua- 
tion tests and 25 markers) were adminis- 
tered to 225 high-school seniors for the 
symbolic analysis and 41 measures (22 
evaluation tests and 19 markers) were ad- 
ministered to 202 high-school juniors for 
the semantic analysis. Principal-axes fac- 
tors were obtained, 18 for the symbolic 
analysis and 14 for the semantic analysis, 
and were rotated both graphically and 
analytically, observing the criteria of sim- 
ple structure, positive manifold, and psy- 
chological meaningfulness. 

All the hypothesized factors emerged, 
defined largely by the tests designed to 
measure them. The reference factors were 
also successfully isolated, indicating gen- 
erally good factorial invariance for the 
marker tests. 


Among the six symbolic-evaluation fac- 
tors, ESU was clearly defined as a factor 
that could be described nontechnically as 
clerical speed and accuracy, the ability to 
judge rapidly symbolic material in terms 
of identity or error. The ESC factor was 
least clear-cut, but it involved sensitivity 
to class properties. ESR was clearly iso- 
lated as the ability to make choices among 
symbolic relationships on the bases of iden- 
tity and consistency. The ESS factor ap- 
peared to involve the estimation of simi- 
larity among and values within symbolic 
series. The clear isolation of EST defined 
it as the ability of sensitivity to the fulfill- 
ment of criteria by rearrangements and 
substitutions of letters within words. The 
ability to judge the consistency or probabil- 
ity of implications from symbolic material 
was represented by the ESI factor. 

Of the six hypothesized factors of seman- 
tie evaluation, EMU, EMC, and EMR are 
essentially new factors, well supported by 
the results and parallel in nature to their 
analogous symbolic factors. EMR replaces 
the factor formerly known as “logical eval- 
uation,” erroneously allocated to the EMR 
cell of the structure of intellect. Factor 
EMS was essentially a verification of the 
previously recognized factor of “experien- 
tial evaluation,” but the new EMS factor 
is somewhat broader. A new factor re- 
places “sensitivity to problems” with a 
better claim to the cell EMI. The implica- 
tions factor involves decision making re- 
garding reasonableness of relatively remote 
consequences or expectancies. 

Relatively clear separation among abili- 
ties according to operation, content, and 
product was interpreted as substantially 
contributing to the value of the unified the- 
oretical model of the structure of intellect 
as a hypothetico-deductive theory for gen- 
erating predictions concerning individual 
differences in intellectual functioning. 
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THE ASSESSMENT CENTER IN THE MEASUREMENT OF 
POTENTIAL FOR BUSINESS MANAGEMENT i 


DOUGLAS W. BRAY 4x» DONALD L. GRANT* 
American Telephone and Telegraph Company 


The assessment process in the Bell System’s Management Progress Stud; 
is described, and the results of several analyses of the process Mis A 
Included are studies of assessment staff evaluations, contributions to the 
process of selected techniques, and relationships of assessment data to sub- 
sequent progress in management. The results, based on 355 young mana- 
gers, indicate that the evaluations by the assessment staffs were influenced 
considerably by their overall judgments of the men assessed but also made 
many. intraindividual discriminations. The results also show that all of the 
techniques studied made at least some contribution to the judgments of the 
assessors. Situational methods (group exercises and In-Basket) had con- 
siderable influence; paper-and-pencil ability tests had somewhat less in- 
fluence; personality questionnaires were given the least weight. (Projective 
methods and interviews were not included in the analyses but are being 
studied.) The relationships between assessor judgments and subsequent 
progress in management, though covering only a relatively short time pe- 
riod, indicate that the assessors’ predictions were quite accurate. The results 
also show that a complex of personal characteristics is more predictive of 
progress than any single characteristic. Some of the characteristics, however, 
| appear to have higher relationships to progress than do others. Of the tech- 
niques studied the situational methods and paper-and-pencil ability tests 
are more predictive of progress than the personality questionnaires. 


ess arises from several aspects of the Study 
design: 


he assessment center method of evalu- 
ating individual characteristics and 


potential, although not widely studied or 
used because of its expense and complexity, 
has continued to command interest. Early 
applications of the method occurred pri- 
marily in military or academic contexts, 
but there is presently a growing experimen- 
tation with assessment centers in American 
business as a method of evaluating mana- 
gerial ability. 

The Bell System’s Management Progress 
Study (Bray, 1964) offers a unique oppor- 
tunity to study the assessment process. The 
Study, which began in 1956, is a longitudi- 
nal study of the development of-young men 
in a business management environment. 
Assessment per se is only one of the several 
research methods being used. 

The uniqueness of the Study from the 
standpoint of studying the assessment proc- 


. The senior author is responsible for the de- 
Sign of the Management Progress Study and the 
assessment center method used in the Study; the 
junior author planned and carried out all the 
analyses included in this report. 


1. There is no contamination by the as- 
sessment results of the subsequent criterion 
data. Along with all other information col- 
lected on the 422 subjects of the Study, the 
assessment data are being held in strict 
confidence. Thus the judgments of the as- 
sessment staff have had no influence on the 
careers of the men being studied. 

2. The subjects of the Study have been 
or will be reassessed. A means thereby ex- 
ists for taking into account the effects of 
growth on assessed characteristics. 

3. Because of the longitudinal nature of 
the Study and of the vast amount of data 
being accumulated, there is very little limit 
to the number of kinds of analyses involv- 
ing the assessment data which can be made. 
‘As a result, the interrelationships of the 
assessment data with a variety of criteria 
can and are being investigated. 


Nature of Assessment 


The origin of the use of multiple assess- 
ment procedures on a large scale is credited 
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to German military psychologists (OSS, 
1948). The British adapted the procedures 
to the screening of officer candidates, and 
the United States Office of Strategic Serv- 
ices (OSS) took over the approach from the 
British during World War II (OSS, 1948). 
Since that time several studies of various 
applications of these procedures have been 
reported in the literature (see especially 
Taft, 1959, and Cronbach, 1960, for sum- 
maries of such studies). 

In general, the procedures employed have 
involved the use of multiple methods for 
obtaining information on individuals, 
standardization of these methods and those 
for making inferences from them, and the 
use of several assessors whose judgments 
are pooled in arriving at evaluations of the 
persons assessed. As might be expected, 
many variations in methodology have been 
reported. 

A major contribution of the multiple as- 
sessment approach has been the use of 
situational tests or exercises. Though not 
restricted to multiple assessment, the ap- 
plication of situational techniques to as- 
sessment has been featured in such pro- 
grams as that of the OSS. 

Situational methods offer the potential 
of adding to the scope of human character- 
istics which can be evaluated. Though 
much more expensive and time consuming 
to administer than paper-and-pencil tests 
and questionnaires, the need to find ways 
of evaluating characteristics not covered by 
the latter is sufficient to warrant extensive 
experimentation with relatively elaborate 
techniques. 

Assessment procedures also contrast with 
psychometric ones in the way the resulting 
data are combined. Psychometric ap- 
proaches depend on mathematical methods 
for accomplishing this purpose whereas as- 
sessment approaches combine the data 
judgmentally. 


Experience with Assessment 


Taft (1959) has pointed out that the 
reasons for using multiple assessment ap- 
proaches have varied (e.g., personality re- 
search, selection, validation of techniques) . 
As a consequence, the foci of research us- 


ing such procedures have differed. He noted, 
however, that “All assessment programs 
involve studies of the link between two or 
more pieces of behavior.... Some of this 
behavior is known as assessment behavior 
and some as criterion behavior [p. 377].” 

The majority of studies reporting use of 
multiple assessment procedures have fo- 
cused on prediction. Many of these studies, 
particularly several of the earlier ones, re- 
ported results which raised serious ques- 
tions regarding the “predictive validity” of 
such procedures. Other studies, however, 
have tended to support the value of using 
assessment approaches for predictive pur- 
poses. 

Among the former are those by the OSS 
(1948), the study by the Veterans Adminis- 
tration of clinical psychologists (Kelly & 
Fiske, 1951; Kelly & Goldberg, 1959), the 
Menninger School of Psychiatry study 
(Holt & Luborsky, 1958), and the study of 
Air Force officers (MacKinnon et al, 
1958). Studies reporting relatively high 
predictive validities for the assessment pro- 
cedures used include two studies cited by 
Cronbach (1960), that is, those of British 
civil service candidates (Vernon, 1950) and 
of American OCS applicants (Holmen, 
Katler, Jones, & Richardson, 1956). In 
addition, two relatively recent studies, one 
of Scandanavian airline pilots (Trankell, 
1959), and another of managerial personnel 
(Albrecht, Glaser, & Marks, 1964) report 
relatively high correlations between assess- 
ment results and subsequent measures of 
performance, 

Though no firm conclusions regarding 
the predictive validities of multiple assess- 
ment procedures can be drawn from the 
rather mixed findings of published research, 
it does appear clear that the more accurate 
predictions were obtained where the per- 
formance to be predicted was clearly de- 
fined, the assessment results did not restrict 
the range of subsequent criterion perform- 
ance, and the criterion measures employed 
were not not limited by low reliability and 
questionable validity. None of the pub- 
lished studies, incidentally, report com- 
pletely invalid results, though in some the 
correlations with performance criteria are 
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disappointingly low. Furthermore, in such 
studies as that of clinical psychologists 
(Kelly & Fiske, 1951) the paper-and-pencil 
tests used predict subsequent performance 
as well as do the assessment ratings. 


Theoretical and Methodological 
Considerations 


As has been noted (Albrecht et al., 1964) 
assessment procedures have been applied 
without the benefit of much prior develop- 
mental effort, either theoretical or empiri- 
cal. The OSS (1948) did follow a rather 
well-developed rationale in applying the 
procedures, though it is evident from their 
report that considerable trial and error ac- 
companied application. Many modifications 
in the procedures were made as the pro- 
gram developed. 

Subsequent studies have contributed lit- 
tle to the formulation and testing of as- 
sessment principles. After reviewing the 
pertinent literature Taft (1959) discusses 
many of the issues involved, including the 
different strategies used in predicting cri- 
teria performance. Stern, Stein, and Bloom 
(1956) have published the most thorough 
discussion, illustrated by small-sample 
studies, of assessment “models.” Four al- 
ternative approaches are described along 
with considerations of the advantages and 
disadvantages of each. 

Though no firm set of principles have 
emerged from such discussions, certain as- 
pects of assessment have been highlighted 
ee warrant consideration, even though 

rief, 

Prior analysis. Much emphasis has been 
placed on the need for a thorough study of 
the total situation for which the assessment 
is being made. Included are such factors as 
behavioral requirements, environmental in- 
fluences, functional roles, and value judg- 
ments of "significant others" (Stern, Stein, 
& Bloom, 1956). From such an analysis are 
derived the variables for which assessment 
staff judgments are to be made. 

Assessed characteristics. Based on the 
prior analysis, the characteristics assessed 
(usually including an "overall" evaluation) 
must be defined in behavioral terms so as to 
facilitate appropriate judgments by the 


assessment staff. No firm set of principles 
for determining the number of characteris- 
ties to be assessed nor for assuring adequate 
definitions have been advanced. In practice 
the number and nature of such variables 
have varied widely. Principles of rating, 
such as developed by Wherry (1952), are 
pertinent, however, to making such de- 
cisions. 

Techniques. Methods for obtaining rele- 
vant information on the persons to be as- 
sessed are selected or developed in accord 
with the charaeteristies for which judg- 
ments are to be made. Again, no firm set of 
prineiples for making these decisions have 
been advanced. Multiple methods have been 
favored by many practitioners. In practice, 
a variety of methods (including interviews, 
situational tests, paper-and-pencil tests, bi- 
ographical questionnaires, and projectives) 
have been used. Little information on the 
relative values to the assessment process of 
the various methods has been reported. The 
number of methods used also has varied 
widely. 

Staffing. Presumably, the success of as- 
sessment depends considerably on the com- 
petence of the assessors. If so, the size and 
organization of the staff, their selection, the 
quality of training given them, and their 
supervision should influence the results ob- 
tained. Professionals have tended to favor 
professionally trained persons for assess- 
ment activities, though in one of the studies 
cited by Cronbach (1960) an “amateur” 
staff provided quite valid predictions. 
Again, pertinent information leading to a 
set of principles is needed. 

The assessed. In practice, the numbers of 
persons assessed at a particular time have 
varied considerably. Logistical requirements 
obviously have influenced decisions regard- 
ing this aspect of assessment, Presumably, 
however, some optimal ratio of assessees to 
staff and to the number and nature of tech- 
niques may exist, but have not been elab- 
orated. 

Evaluating the assessees. The final step 
in the assessment process is that of evalu- 
ating the assessees. The entire assessment 
staff, or selected individuals therefrom, re- 
view the evidence and rate each person on 
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each of the assessment variables. Again, 
there have been many variations in the 
rating procedures used. Little attention has 
been given such factors as the number of 
raters, the time lag between observation and 
rating, the number of assessees rated at a 
time, and the mechanics for pooling indi- 
vidual ratings. Principles of rating such 
as developed by Wherry (1952), previously 
referred to, have pertinence to this crucial 
step in the assessment process. 

Evaluating results of assessment. De- 
termining the “validity” of procedures has 
been a major concern of practitioners of 
this art. Much attention has been given to 
“predictive” validity, very little to “con- 
struct” validity. Problems in research de- 
sign have been discussed at length by sev- 
eral investigators (e.g, Albrecht et al., 
1964; OSS, 1948). Criterion problems have 
proven as thorny for these investigators as 
they have for the psychometrician. Further- 
more, where prior screening has been effec- 
tive and/or the assessment results have in- 
fluenced personnel decisions allowance for 
the consequent restrictions on range of sub- 
sequent performance has been inadequate. 
It seems reasonable to believe that an ac- 
curate evaluation of assessment results re- 
quires appropriate criterion measures, & 
representative range of criterion behavior, 
including sufficient time following assess- 
ment to permit the development of relevant 
criterion behavior, and adequate controls 
over other factors which may introduce ir- 
relevance in the criterion measures. 

As will be seen in the ensuing section of 
this report, many of the considerations dis- 
cussed above were taken into account in 
designing the assessment phase of the Study. 
The “model” employed was the one de- 
veloped by the OSS (1948), modified to fit 
the requirements of the Study. 

Because the focus of the Study is not on 
assessment per se, no attempt has been 
made to test out alternative approaches to 
assessment. The assessment procedures were 
modified somewhat, however, in light of 
initial experience with them. Following 
initial changes the remainder of the assess- 
ment work was carried out with standard- 
ized procedures. 


This report is directed at presenting the 
early findings on assessment procedures de- 
riving from the Management Progress 
Study. Future reports will present addi- 
tional results. Covered in this report are: 

1. Descriptions of the assessment pro- 
cedures. 

2. Analyses of the assessment staff evalu- 
ations. 

3. Contributions to the assessment proc- 
ess of selected techniques. 

4, Relationships of assessment data to 
progress in management over relatively 
short time periods. 


ASSESSMENT PROCEDURES 


The subjects of the Management Progress 
Study were assessed at the time of their inclusion 
in the Study. The purpose of the assessment was 
to measure personal characteristics hypothesized 
to be of importance either in developmental 
change in early adulthood or success in business 
management. 

The sample of 422 men were at the time of 
assessment employees of six Bell System telephone 
companies. Approximately two-thirds of the sam- 
ple were college graduates who were assessed 
soon after employment. The remaining third had 
been employed initially for nonmanagement posi- 
tions and had advanced into management rela- 
tively early in their careers. These men were not 
college graduates when employed; a few had since 
earned degrees by part-time and evening study. 

The first step in designing the assessment proce- 
dures was that of selecting and defining those 
characteristics to be assessed. In doing this a 
thorough review of the literature was supple- 
mented by securing the judgments of experienced 
Bell System personnel men as to the qualities 
they believed to be most important in success in 
the business. The many characteristies which this 
process produced were finally reduced to a list 
of 25 qualities. Techniques (described below) 
were then selected or developed to reveal these 
variables. 

The assessment staffs consisted primarily of 
professionally trained persons, though as the Study 
progressed a few telephone company managers 
served on the staffs, None of the latter, inciden- 
tally, came from the company of the men being 
assessed. 

The subjects spent 3/& days at the assessment 
center in groups of 12. Immediately following, the 
assessment staff conducted extensive discussions 
of each participant and rated each on the 25 
characteristics. At a later date a narrative sum- 
mary of each man’s performance was prepared. 

Assessment of the subjects was spread over 
several summers. The first ones assessed, all the 
college graduates employed by one telephone 
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company in that year, were assessed during the 
summer of 1956. The last to be assessed were 
processed during the summer of 1960. Modifica- 
tions in the methods used were made subsequent 
to the 1956 assessment. Thereafter, the methods 
remained standard. 


T'echniques 


The methods used for collecting information on 
the personal characteristics of the participants 
are representative of those used generally in as- 
sessment activities. A listing of the techniques 
with a brief description of each follows: 

Interview. A 2-hour interview with each man 
directed at obtaining insights into his personal 
development up to that time, work objectives, 
attitudes toward the Bell System, social values, 
scope of interests, interpersonal relationships, 
idiosyncrasies, etc. 

In-Basket, A set of materials which a telephone 
company manager might expect to find in his in- 
basket. The items, 25 altogether, range from 
telephone messages to detailed reports. In addi- 
tion, examinee was furnished with such necessary 
materials as a copy of the union contract, organi- 
zation chart, and stationery. He was given 3 hours 
in which to review the materials and take appro- 
priate action on each item (by writing letters, 
memos, and notes to himself). Following comple- 
tion of the *basket" he was interviewed concern- 
ing his approach to the task, his reasons for tak- 
ing the actions indicated, and his views of his 
superiors, peers, and subordinates (as inferred 
from the materials). 

Manufacturing Problem (made available by 
John Hemphill of the Educational Testing Serv- 
ice). A small-business game wherein the partici- 
pants assumed the roles of partners in an enter- 
prise manufacturing toys for the Christmas trade. 
The participants were required to buy parts and 
sell finished products under varying market con- 
ditions, to maintain inventories, and to manufac- 
ture the toys. 

Group Discussion. Also a leaderless group situ- 
ation, focused around a management personnel 
function. Participants were instructed to assume 
the roles of managers, each having a foreman re- 
porting to him considered capable of promotion. 
Participants were required to discuss the merits 
and liabilities of their hypothetical foremen and 
to reach a group decision regarding their relative 
promotabilities. 

Projectives. (a) Rotter Incomplete Sentences 
Blank (published by the Psychological Corpora- 
tion). (b) Bell Incomplete Sentences Test (by 
Walter Katkovsky and Vaughn Crandall with the 
advice and assistance of Julian Rotter). (c) The- 
matic Apperception Test (published by Harvard 
University Press). Six of the cards from this test 
were administered. 

Paper-and-pencil tests and. questionnaires. (a) 
School and College Ability Test, Form 1 (pub- 
lished by the Cooperative Test Division of The 


Educational Testing Service), (b) A Test of 
Critical Thinking in Social Science (American 
Council on Education, 1951), Now out of print, 
this test was designed to measure several aspects 
of critical thinking, including the ability to define 
problems, select pertinent information, recognize 
unstated assumptions, evaluate hypotheses, and 
make valid inferences. (c) Contemporary Affairs 
Test (formerly published by the Cooperative 
Test Division of the Educational Testing Serv- 
ice). Annually since 1956 the Personnel Research 
Section of AT & T has developed, following 
the Educational Testing Service format, its own 
version of this test. (d) Edwards Personal Prefer- 
ence Schedule (published by the Psychological 
Corporation). (e) The Guilford-Martin Inventory 
of Factors GAMIN (published by the Sheridan 
Supply Company). (f) Opinion Questionnaire, 
Form B. Unpublished, this questionnaire was 
adapted from Bass (1955) and yields three scores 
—authoritarianism (A), acquiesence (a), and neg- 
ativism (n). (g) Survey of Attitudes Toward 
Life. Unpublished, this questionnaire, made avail- 
able to the Bell System by Irving Sarnoff of New 
York University, is designed to reflect a person’s 
attitudes toward making money and advancing 
himself. 

Miscellaneous. (a) Personal history question- 
naire. (b) Short autobiographical essay. (c) Q 
sort (70 items, self-descriptive). 


Administration and Reporting 


All of the assessment techniques were adminis- 
tered according to standard instructions by the 
staff member or members responsible for each 
method, Naturally, the ways in which this was 
accomplished yaried according to the nature of 
the technique. Thus all interviews were conducted 
by individual staff members with individual as- 
sessees. All tests, questionnaires, and situational 
exercises were administered to groups of partici- 
pants. The two group problems involved six 
participants at a time, Two staff members ob- 
served each group problem, recorded their im- 
pressions of each participant, and independently 
evaluated the performance of each. 

For each method one or more of the staff pre- 
pared written reports. The interviewers dictated, 
as nearly verbatim as possible, reports of their 
interviews. One staff member reviewed each 
completed In-Basket, along with the accompanying 
interviewer's report, on handling the “basket,” and 
prepared reports describing how each man dealt 
with the materials in the exercise along with 
evaluating his effectiveness in so doing. Simi- 
larly, one of the observers for each group problem 
prepared reports describing and evaluating indi- 
vidual performance in these exercises, including 
ratings and rankings by both peers and observers. 
A clinically trained psychologist reviewed the 
projective protocols and prepared individual re- 
ports. The paper-and-pencil tests were scored and 


summarized. 
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Rating Variables 


The personal characteristics selected for evalu- 
ation reflect varied aspects of what could be re- 
ferred to as “criterion” performance, broadly con- 
ceived. Some are directly related to managerial 
functions (eg, organizing, planning, decision 
making, problem solving). Others refer to inter- 
personal relationships and influence (e.g, com- 
munications skills, personal impression, sensitivity, 
dependence on others). Still others relate to gen- 
eral abilities (e.g., intellectual ability, adaptabil- 
ity). 

Motives, values, and attitudes are covered by 
several of the variables. Included are attitudes 
toward the importance of work and toward work- 
ing for a large company. Social attitudes are 
included as are desires for advancement and se- 
curity. Personal goals, self-evaluations, and ex- 
pectations also were evaluated. 


Staff Evaluations 


Immediately following the 3⁄2 days of col- 
lecting information on the subjects, the assessment 
staff, consisting usually of nine persons, as- 
sembled, reviewed, and discussed the results. 
Each man assessed was evaluated separately, 1 to 
1% hours being required per man. 

A typical evaluation consisted of first reading 
the man’s short autobiographical essay. An inter- 
viewer then read a summary of his interview 
with the man. Reports on the In-Basket, group 
exercises, paper-and-pencil test scores, and pro- 
jectives followed, concluding with a reading of 
the Q-sort items the man selected as “most” and 
“least” like him. 

Following presentation of the reports each staff 
member independently rated the man on each of 
the 25 characteristics (from 1 [low] to 5 [high]). 
Each of the variables then was reviewed. Where 
differences of opinion occurred the evidence was 
discussed and staff members permitted, though 
not required, to adjust their ratings. 

After the variables had been rated, the staff 
evaluated the man’s potential as a management 
person in the Bell System. Separate judgments 
were recorded, independently, regarding the 
man’s likelihood of remaining in the Bell System 
and, assuming that he would remain, of achieving 
middle management in 10 or less years. In addi- 
tion, the staff noted their judgments as to whether 
the man “should” advance to middle management. 
Again, the ratings of potential were discussed and 
adjusted where staff members wished to do so. 

For analysis purposes all of the data thus col- 
lected were filed, using code numbers to protect 
anonymity, at the Fels Research Institute at 
Antioch College. So as to obtain a consensus rat- 
ing on each variable the variable ratings were 
averaged. The ratings of potential, however, were 
trichotomized in order to reflect the extent of 
agreement by the staff (No, ?, Yes). In the analy- 


ses to be presented these consensus ratings were 
used. 


ANALYSES OF THE STAFF EVALUATIONS 


To date, three kinds of studies of the 
assessment process have been undertaken, 
namely: 

1. Studies of the interrelations between 
the rating variables and their relationships 
to the overall predictions of the assessment 
staff. 

2. Studies of the assessment techniques 
with particular reference to their contribu- 
tions to the assessment process. 

3. Studies of relationships between staff 
evaluations and such behavioral criteria as 
survival in the business and progress in 
management. 

Analyses of assessment staff evaluations 
have been made by intercorrelating the 
ratings on the 25 variables, correlating the 
ratings with the overall predictions of the 
staff, and factoring the intercorrelations 
between the variables. The results of the 
latter are presented in the following pages. 

The reasons for factoring the rating 
variables are twofold. In the first place it 
was expected that the factorial results 
would result in clarifying the nature of the 
judgments made. Mere inspection of the 
25 variables undoubtedly would suggest 
many of the underlying constructs employed 
in making the judgments. Factorial results, 
however, should permit greater precision in 
interpreting the variables and also make it 
possible to avoid misinterpretations. 

Factorial results also can provide a use- 
ful method for organizing ratings for sub- 
sequent analyses, Composite scores based 
on factors are presumably more reliable 
than those based on individual variables 
and, being fewer in number, can be more 
efficiently utilized. 


Method 


The entire sample of 422 men was divided ac- 
cording to educational background at time © 
employment. The ratings for all who were not 
college graduates at time of employment (N= 
148) and those for 207 of the college graduates 
were used in the analyses. (Because of revisions 
in the assessment methods, ratings for 67 college 
graduates from the first telephone company 1° 
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volved in the study were excluded from the analy- 
ses.) Separate analyses were made for each sam- 
ple. 

For each sample the rating variables and the 
staff prediction regarding the likelihood of pro- 
gressing to middle management in 10 or less years 
were intercorrelated. The product-moment corre- 
lations computed are presented in Appendix A. 
The resulting matrices were then factored by a 
method developed by Wherry (1959) for deter- 
mining a hierarchical factor solution. Though the 
method is a general one it appears particularly 
relevant to rating data, which characteristically 
are influenced by “halo effects.” The higher-order 
factors obtained presumably reflect the latter, 
whereas the lower factors reflect more specific 
judgments by the raters. 


Results 


The factorial solutions yielded 11 factors 
for the college graduate sample and 8 fac- 
tors for the nongraduate sample. The load- 
ings, paired for comparable factors, are 
shown in Appendix B. That the results ac- 
count for relatively large shares of the total 
variances of the ratings for both samples is 
indicated by the magnitude of the com- 
munalities. The average communality for 
the college group is .64, that for the non- 
college being .57. 

Both solutions yielded higher-order fac- 
tors. For the noncollege sample a single 
"general" factor was obtained (Factor I). 
This factor separated into three higher- 
order factors for the college sample (a 
third-order general factor and two second- 
i subgeneral factors, Factors II and 
b Factor I has similar patterns of loadings 
in both samples. For that matter, Factor I 
could be described as reflecting the assess- 
ment staff’s “model” for managerial po- 
tential (the loadings of the staff predictions 
being highest on this factor). In general, a 
man rated high on this factor was seen as 
effective in organizing, planning, and de- 
cision making, likely to solve a management 
problem in a novel way, skillful in dealing 
with others, resistant to stress, above aver- 
age in intellectual competence, able to com- 
munieate orally, energetic, and perceptive 
of the behavior of others. Motivationally, 
he was seen as desirous of advancing in the 
management hierarchy, having high stand- 


ards of work performance, not particularly 
concerned with job security, and relatively 
independent of the approval of his peers 
and superiors, 

Factor II (college sample only) also is 
indicative of managerial potential (having 
nearly as high a loading on the staff predic- 
tion as does Factor I). Those achieving high 
ratings on this factor are similar in many 
characteristics to the more highly rated on 
Factor I. They differ chiefly in the motiva- 
tional areas, being less likely to value ad- 
vancement in the management hierarchy 
and more likely to value high performance 
standards in the work itself. They also may 
or may not be dependent on the approval of 
their peers and superiors and may or may 
not value a secure job. They are somewhat 
more likely to be skilled in dealing with 
others though less likely to stand up under 
stress. 

Factor III (college sample only) has its 
highest loadings on several motivational 
variables. Those evaluated high on this 
factor are characterized by willingness to ' 
postpone rewards, particularly advance- 
ment in the organization, along with want- 
ing a secure job. They seek the approval of 
their peers and superiors and are likely to 
incorporate company values, Because such 
needs appear indicative of passivity (avoid- 
ing competition and risk taking) on the one 
hand and dependency (desiring support 
from others) on the other, this factor is 
named “passive dependency.” 

Tt is not self-evident from the data as to 
why a single higher-order factor resulted 
from the analysis of the correlations from 
the noncollege sample whereas three such 
factors were generated by the analysis of 
the college sample matrix. One can specu- 
late that the college sample was more ho- 
mogeneous with respect to the character- 
istics evaluated, though further study would 
be required to verify this speculation, 

From the analyses seven first-order fac- 
tors were determined for the college sample 
and six for the noncollege sample. These 
factors reflect the more specific judgments 
of the assessment staff. Summary descrip- 
tions of each factor, including variables 
with the highest loadings, follow: 


Range of interests 


Need for security (negative) 
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Factor Sample Variables Name 
IV Both Organizing and planning Administrative skills 
Decision making 
v Both Human relations skills Interpersonal skills 
Behavior flexibility 
Personal impact 
VI Both Tolerance of uncertainty Control of feelings 
Resistance to stress 
VII Both Scholastic aptitude Intellectual ability 
Range of interests 
VIII Both Primacy of work Work-oriented motivation 
Inner work standards 
IX Both Ability to delay gratification Passivity 
i Need for security 
Need for advancement (negative) 
X Both Need for superior approval Dependency 
Need for peer approval 
Goal flexibility 
XI College Social objectivity Nonconformity 


Need for superior approval (negative) 
Bell System value orientation (negative) 


The relative importance of the judg- 
ments by the assessment staff of general 
effectiveness is apparent from the analyses. 
Factors I and II account for 30% of the 
average total variance (47% of the ac- 
counted-for variance) in the college sample 
while Factor I accounts for 26% of the 
average total variance (45% of the ac- 
counted-for variance) in the noncollege 
sample, In brief, as might be expected, the 
raters were influenced by overall impres- 
sions of the men assessed. The method of 
making the ratings, across the variables, 
might be expected, of course, to enhance 
any halo tendencies that were present. 

Further evidence regarding the weight 
given overall impact is obtained from in- 
spection of the loadings on the staff predic- 
tions of potential (will reach middle man- 
agement in 10 or less years). It is apparent 
that the general factors (I and II) are the 
most heavily weighted on this variable. For 
the college sample these factors account 
for 61% of the total variance (73% of the 
accounted-for variance) of the staff predic- 
tions while for the noncollege sample Factor 
I accounts for 42% of the total variance 
(64% of the accounted-for variance). Much 
lesser weights were obtained for interper- 
sonal skills, intellectual ability, and noncon- 
formity (college sample) and intellectual 
ability and passivity (noncollege sample), 


while the remaining lower-order factors 
have practically zero weights. Thus in mak- 
ing predictions of progress in the manage- 
ment hierarchy the assessment staffs evi- 
dently were primarily influenced by their 
overall judgments and secondarily by more 
specific evaluations. 

Despite such tendencies, however, the 
assessment staffs were able to make many 
discriminations on more specific variables. 
Eight first-order factors for the college 
sample and seven for the noncollege sample 
are relevant evidence as is the fact that 
for both samples over half of the average 
accounted-for variance can be ascribed to 
the more specific factors. 

Though there are some discrepancies, the 
consistency in the factor structure from 
sample to sample is quite high. Because the 
assessment process was the same for both 
samples it is hardly surprising, despite 
differences in the educational backgrounds 
of the men assessed, that the raters were 
influenced by similar considerations. 4 

The nature of the more specific factors 18 
of interest. Three reflect abilities (adminis- 
trative skills, interpersonal skills and intel- 
lectual ability) whereas five reflect temper- 
ament and motivation (control of feelings, 
work-oriented motivation, passivity, de- 
pendency, and nonconformity). The analy- 
ses makes it apparent that the methods 
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used in observing the men and the variables 
employed in evaluating performance permit 
consideration of a wide range of charac- 
teristics. 

The factorial results also help to clarify 
the constructs used by the staff evaluators. 
Mere inspection of the variables would 
suggest such factors as administrative skills, 
intellectual ability, and control of feelings. 
Whether factors like passivity and noncon- 
formity would result from such an inspec- 
tion is doubtful. Furthermore, some of the 
variables would have been difficult to evalu- 
ate. 

The variable “goal flexibility” can be 
used to exemplify this point. One might not 
hypothesize that ratings on this variable 
reflect dependency needs. The factorial re- 
sults, however, suggest such an interpreta- 
lion. It seems reasonable to hypothesize 
that persons rated high on the variable 
adapt their goals to the expectations of 
other people, whereas those low on the 
variable, the more “inner directed,” persist 
in goals set for themselves. 

As previously pointed out the factorial 
results have proven a useful method for 
organizing the data. Composite scores for 
each factor were developed by selecting var- 
iables with the higher loadings on each fac- 
tor (generally .30 or higher) and simply 
adding the variable scores. The resulting 
values are not “factor scores” because no 
attempt was made to partial out other fac- 
tors, particularly the general ones, which 
contribute to each score. It was decided, 
however, that the composite scores thus de- 
rived would tend to reflect the underlying 
factors. Applications of these scores are 
discussed in the next section of this report. 


ANALYSES OF TECHNIQUES 

Several studies have been made of the 
techniques used in collecting information on 
the personal characteristics of the partici- 
pants. Particular attention has been given 
to the contributions the methods have made 
to the evaluations by the assessment staff. 
Not all of the methods used have been 
studied, though eventually it is planned 
to make the coverage as complete as possi- 
ble. The methods studied to date include 
the more directly scorable; that is, the 


group exercises, In-Basket, mental ability 
tests, and personality questionnaires. Major 
omissions are the less easily quantified in- 
terviews and projective instruments. 


Group Exercises 


Each of the group exercises (Manufac- 
turing Problem and Group Discussion) was 
observed by two members of the assessment 
staff. These observers made notes from 
which a report was prepared on each partic- 
ipant for presentation to the assessment 
staff. The observers also independently 
rated and ranked each participant on his 
overall contribution to the problem. In 
addition, for the Group Discussion only, the 
observers rated each man on his effective- 
ness in oral presentation. 

Additional ratings and rankings were 
obtained from each participant who evalu- 
ated his own performance (self-rating and 
-ranking) and the performance of each man 
in the group (peer rating and ranking) on 
his overall contribution to the problem. For 
analysis purposes the peer evaluations were 
averaged. Furthermore, as indications of 
“self-objectivity,” difference scores between 
each self-rating and -ranking and the cor- 
responding average peer rating and ranking 
were ascertained. 

In summary, the following scores were 
obtained on each participant and presented, 
along with a descriptive report, to the as- 
sessment staff at its evaluation meeting: 

Manufacturing Problem 

Ratings on overall contribution 
Observers (independent and average) 
Peers (average) 

Self 
Algebraic differences, self and aver- 
age peer 

Rankings on overall contribution: 
Observers (independent and aver- 
age) 

Peers (average) 

Self 

Algebraic difference, self and aver- 
age peer 

Group Discussion 

Ratings on oral presentation: 
Observers (independent and aver- 


age) 
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Ratings on overall contribution: 
Observers (independent and aver- 
age) 

Peers (average) 

Self 

Algebraic difference, self and average 
peer 

Rankings on overall contribution: 
Observers (independent and aver- 
age) 

Peers (average) 

Self 

Algebraic difference, self and average 
peer 

The various scores resulting from the 
two group problems were analyzed by cor- 
relational methods in order to ascertain: 

1. The extent of agreement between rat- 
ers (observers, peers, and self). 

2. The extent of overlap between meth- 
ods of evaluation (ratings and rankings). 

3. The extent of overlap between evalua- 
tions of performance in the two exercises. 

4. The extent to which the exercises con- 
tributed to the evaluations of the men as- 
sessed (ratings by the assessment staff). 

In determining rater agreement, overlap 
between methods, and overlap between ex- 
ercises the results for 355 participants in 
five companies were combined. Product- 
moment correlations between the variables 
were computed for each company sample. 
The correlations were then averaged (after 
converting to z’s) in order to obtain esti- 
mates for the entire sample. 


Rater Agreement 


c The extent of agreement between raters 
is shown in Table 1. The evaluations (rat- 
ings and rankings) of Observer 1 were cor- 


TABLE 1 
RATER AGREEMENT 


Obser- 


cj was Cum Pun 
(with with i € 
D^ kes Sap s 
Manufacturing 
Problem 
Overall rating .60 .64 AT 45 
Overall ranking .69 .59 .38 


43 
Group Discussion 
Overall rating 75 -73 .55 .51 
Overall ranking 15 .69 50 .45 


related with those of Observer 2. In addi- 
tion, the averaged ratings and rankings of 
the two observers were correlated with the 
averaged ratings and rankings of the peers 
and the self-ratings and -rankings. Finally, 
the averaged peer ratings and rankings were 
correlated with the self-evaluations. 

For both exercises the correlations are 
positive and relatively high. The agreement 
between raters tends to be higher, in gen- 
eral, for the Group Discussion than for the 
Manufacturing Problem. The self-ratings 
tend to correlate lower with the peer and 
observer ratings than do the observers with 
each other or with the peers. 

The need for multiple observers is indi- 
cated by the magnitude of agreement be- 
tween the two observers. Though relatively 
high, it is not sufficiently high to warrant 
dispensing with either of the observers. 

The reasonably high agreement between 
the different sources of ratings suggests 
that all were reacting to many of the same 
aspects of individual performance. Evi- 
dently, the objective aspects of both exer- 
cises are sufficiently apparent to observers 
and participants alike to influence them 
similarly in arriving at their evaluations. 


Method Agreement 


The extent of agreement between the rat- 
ing and ranking methods of evaluation is 
presented in Table 2. For both the observers 


TABLE 2 
METHOD AGREEMENT 
Obser- 
vers Peers Self 
Manufacturing Problem .89 .84 .64 
Group Discussion .89 .88 42 


and peers the average ratings were corre- 
lated with the average rankings. 

As might be expected, the correlations 
are relatively high. The ratings and rank- 
ings yield quite similar results. Here again, 
however, the correlations for self-judgments 
are lower. 


Overlap of Exercises 


Table 3 shows the relationships between 
the evaluations of performance in the two 
exercises. 
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TABLE 3 


CORRELATIONS BETWEEN MANUFACTURING 
PROBLEM AND GROUP DISCUSSION 


Observers Peers Self 
Overall rating .45 .92 .98 
Overall ranking AL .46 .38 


The relationships shown are positive and 
fairly high. The two exercises elicited some 
common aspects of behavior in spite of the 
different nature of the two techniques—one 
a small-business game and the other a group 
diseussion. Yet the size of the correlation 
also indicates that each exercise makes a 
unique contribution. 


Oral Communications Skills 


As noted previously, an additional eva- 
luation was obtained from the Group Dis- 
cussion, The observers rated each partici- 
pant on his performance in making an oral 
presentation. The relationships of these rat- 
ings to selected variables are presented in 
Table 4. The ratings were averaged prior 
to computing the correlations with other 
variables. 

It will be noted that the agreement be- 
tween observers (.54) is markedly lower 
than the .75 shown in Table 1 for total con- 
tribution to the exercise. Performance over 
an hour-long period of group interaction 
appears easier to judge than a short talk. 
The lower reliability of these ratings un- 
doubtedly reduces the correlation with over- 
all performance in the group discussion, but, 
even so, it is apparent that skill in oral 
presentation is only one factor in effective- 
ness in the discussion problem. 


TABLE 4 


ORAL COMMUNICATIONS SKILLS 
CORRELATIONS OF OBSERVER RATINGS 


r 


Observer 1 with Observer 2 54 
Group Discussion 
Observer ratings (overall) 54 
Observer rankings (overall) .52 
Peer ratings 43 
Self-ratings 29 
Manufacturing Problem 


Observer ratings -31 


Contributions to the Staff Evaluations 


In order to assess the contributions of the 
group exercises to the total assessment 
process the ratings of performance in each 
exercise were correlated with the final 
judgments of the assessment staffs. These 
judgments are reflected in the scores based 
on the factors previously described and in 
the prediction of advancement potential. 
The factors, as will be recalled, are as fol- 
lows: 


Factor Identification 
I General effectiveness 
II (college graduates General effectiveness 
only) 
III (college graduates Passive dependency 
only) 
IV Administrative skills 
v Interpersonal skills 
VI Control of feelings 
VII Intellectual ability 
VIII Work-oriented motiva- 
tion 
IX Passivity 
x Dependency 
XI (college graduates Nonconformity 
only) 


The correlations of the various ratings for 
the group exercises and the staff judgments 
are shown in Table 5. The college and non- 
college samples are treated separately. 

In general, with a few exceptions, the 
observer ratings correlate highest with the 
factor scores derived from the staff ratings. 
The average peer ratings are next highest 
while the self-rating have the lowest cor- 
relations. The self-rating correlations are 
notably low in many instances. 

For the college graduate sample the cor- 
relations of the staff ratings with observer 
ratings in the group exercises are noticeably 
higher for the Group Discussion than for 
the Manufacturing Problem. The staff 
seems to have "gotten more" out of the dis- 
cussion. For the noncollege sample the very 
small differences that exist favor the Manu- 
facturing Problem. 

The correlations of the ratings and rank- 
ings from the two exercises are nearly 
always higher with the general evaluations 
of the candidates—Factors I and II and 
staff prediction—than with the more spe- 
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TABLE 5 
CORRELATIONS OF GROUPS EXERCISES WITH STAFF JUDGMENTS 
Factor Staff 
pre» 
Sa a a aam x | x |x | om 
College sample (N — 207) 
Manufacturing Problem 
Observer rating 44|42| —29 | 31/39 | 37 | 18 | 30 | —35 | -18 | 19 | 41 
Peer rating 33 | 31 | —24 | 24 | 29 | 22 | 11 | 28 —28 | —15 | 09 39 
Self-rating 15 | 07 | —23 | 03 | 11 | 13 | 04 | 10 | —26 —17 | 10 ll 
Group Discussion 
Observer rating (overall) 67 | 67 | —38 | 48 | 62 | 47 | 27 | 45 | —39 —22 | 24 60 
Observer rating (oral presenta- 56 | 60 | —24 | 42 | 52 | 32 | 26 | 34 | —29 —07|17| 49 
tion) 
Peer rating 58 | 57 | —35 | 41 | 52 | 36 | 31 | 40 | —37 | — 20 | 31 52 
Self-rating 29 | 26 | —24 | 19 | 23 | 21 |10 | 18 | —28 | — 14 | 20 18 
Noncollege sample (N = 148) 
Manufacturing Problem 
Observer rating 60 51 | 52 | 35 | 20 | 39 | —34 | —17 42 
Peer rating 51 48 | 47 | 27 | 22 | 20 | —31 | —16 40 
Self-rating 38 37 | 23 | 13 | 10 | 19 | —43 | —08 31 
Group Discussion 
Observer rating (overall) 57 41 | 45 | 36 | 15 | 36 | —36 | —21 38 
Observer rating (oral presenta- | 56 42 | 47 | 20 | 12 | 36 | —41 00 47 
tion) 
Peer rating 56 44 | 47 | 24 | 17 | 33 | —38 | —17 42 
Self-rating 42 33 | 27 | 31 | 11 | 22 | —30 | —33 34 


cific factors. This is perhaps not surprising 
since what is rated in the group problems is 
general effectiveness. 

Because the two exercises are group-inter- 
action exercises it might be expected that 
among the more specific factors the "inter- 
personal skills” factor scores would show 
the highest correlations with the observers’ 
performance ratings. This proved to be the 
case, but there were almost equally large 
correlations with other factors, including 
Administrative skills, Control of feelings, 
Work-oriented motivation, and Passivity. 
The group exercises apparently yielded evi- 
dence on multiple aspects of performance. 


In-Basket 


Because the In-Basket exercise was not 
“scored” in any way during assessment a 
method for quantitatively evaluating the 
reports for research purposes was evolved. 
This step was essential for determining 
relationships between the results of the 
exercise and the later judgments of the 
assessment staff. 


The method used consisted simply of 
asking two members of the research staff 
to independently read each narrative report 
which had been prepared at the assessment 
center and rate overall performance on a 
5-point scale. A score of 3 indicated an 
average performance, 1 and 2 below-aver- 
age performance, while 4 and 5 signified 
above-average performance. Following the 
independent ratings a composite rating for 
each man, resolving discrepancies, was 
reached by mutual agreement. 

To obtain an estimate of rater agreement 
the correlation between the independent 
evaluations of the raters was determined. 
An r of .92 resulted, indicating a high degree 
of agreement. Both raters stated that with 
a few exceptions the reports clearly evalu- 
ated the performances of the men assessed, 
so that there was little difficulty in assigning 
the ratings. 

The results of correlating the ratings of 
the In-Basket reports with the assessment 
staff evaluations are shown in Table 6. _ 

The In-Basket is primarily an  admin- 
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TABLE 6 
CORRELATIONS OF IN-BASKET 
WITH Starr JUDGMENTS 


College 


Noncol 
mine ur EXIT 

r Ld 
Staff prediction .55 .51 
General effectiveness (I) .60 .59 
Administrative skills à .68 
Interpersonal skills 45 .49 
Intellectual ability .36 .27 
Control of feelings .89 .24 
Work-oriented motivation E .26 
Passivity —.18 —.27 
Dependency —.15 .00 

Nonconformity Bu = 


istrative exercise, and the table shows that 
the technique has its highest correlations 
with staff judgments of administrative 
skills. In contrast to the results for the 
group exercises, these correlations are 
higher than those with the more general 
factors. This suggests that the In-Basket is 
a somewhat more “focused” technique than 
the group problems. Nevertheless, there are 
also substantial correlations with several 
factors other than administrative skills. 


Mental Ability Tests 


The three mental ability tests used in 
assessing participants were employed very 
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specifically by the staff in making its judg- 
ments. Judgments of “scholastic aptitude” 
were based on scores on the School and 
College Ability Test (SCAT) and Critical 
Thinking in the Social Science Test. Scores 
on the Contemporary Affairs Test were 
taken into account in evaluating “range of 
interests.” 

Correlations of the scores on these tests 
with the staff evaluations appear in Table 7. 
The results for the college graduate (C) and 
the noncollege (NC) samples are shown 
separately. 

It is hardly surprising to note that the 
correlations of the three tests with Intellec- 
tual Ability are generally high. For the 
noncollege sample the correlations of SCAT 
and Critieal Thinking with Administrative 
skills also are high, probably because the 
variable “scholastic aptitude” was included 
in the scoring of this factor, though was not 
so included for the college graduate sample. 

The correlations of the tests with the 
Staff prediction and General effectiveness 
are relatively high with the SCAT Verbal 
tending to have the highest correlations of 
the various scores across samples. Though 
the mental ability tests make important 
contributions to the staff judgments it is 
apparent, however, that there is much vari- 
ance in the judgments that cannot be ex- 
plained by scores on these tests. 


TABLE 7 
Corrpiations or MENTAL ABILITY Test ScORES WITH STAFF JUDGMENTS 


———— 


id Ability Test " i 
School and College ity Coittemporaty atta Critical ffs in 

Verbal Quantitative east 

c lnc} c ne c | ne] c NC c aie 
Staff prediction 36 44 06 29 of P FH 2 » Fi 
General effectiveness (I) 47 50 14 39 39 p e ot A A 
Administrative skills 37| 6| 16| 58) 34) 7 T n 20 28 
Interpersonal skills 22| 939 —03| 15) 13) 22 5 2 65 54 
Intellectual ability 79 &| 2)| 46 | T0 62 2 23 16 27 
Control of feelings 23 32| 09] 20) 21 e e 02 14 17 
Work-oriented motivation} 17| 14| 05| 22) 14 E panim 16. | -22 
Passivity -16| 21 | 00 -17| -10| -a1| -10 | -9 | ~18 | -3 
Deen “iz | —16 | -20 | —15 -M -15| —14 | 29 | 719 | 70 
Nonconformity 48 le cong sei br a i 


Note.—C = college graduate: NC = noncollege. 
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Personality and Attitude Questionnaires 


With one exception scores on the per- 
sonality and attitude questionnaires were 
used much more generally than were the 
mental ability test scores in judging the 
participants. The scores (24 in all) were 
read at the staff evaluation meetings and 
each staff member was expected to draw his 
own inferences from the results. The excep- 
tion was the Authoritarianism score from 
the Opinion Questionnaire which specifi- 
cally influenced judgments of “social ob- 
jectivity.” 

Correlations of the questionnaire scores 
with the staff judgments are shown in Table 
8 for the college graduate and noncollege 
samples. Only the higher correlations (.20 
or greater for either sample) are shown. 
The questionnaires are coded as follows: 

PPS—Edwards Personal Preference 

Schedule 

GAMIN—Guilford-Martin Inventory of 

Factors GAMIN 

OQ—Opinion Questionnaire 

SATL—Survey of Attitudes Toward Life 

In general, the correlations of the person- 
ality and attitude measures are distinctly 
lower than are those for the mental ability 
tests. As would be expected the question- 
naire scores have their highest correlations 
with the motivational variables. The scores 
which correlate .20 or higher most often are 
the Edwards dominance and abasement 
seales and the Guilford-Martin general ac- 
tivity and ascendancy scales. 

Variations between the two samples in 
the magnitudes of the correlations will be 
noted. These probably reflect population 
differences as well as differences in the rat- 
ing variables “scored” for each factor. For 
that matter the patterns of correlations ap- 
pear to “fit” the factors for the college sam- 
ple better than they do for the noncollege 
sample. This is particularly true of the de- 
pendency factor. 

The relatively low correlations of the 
personality and attitude questionnaire vari- 
ables suggest that the assessment staffs 
may have been more influenced in their 
judgments of motivational characteristics 
by data obtained from the interview and 
projective instruments than by question- 


TABLE 8 


CORRELATIONS OF PERSONALITY AND ATTITUDE 
QUESTIONNAIRE Scores WITH STAFF JUDGMENTS 


College Non- 
graduates college 
r r 
Staff prediction with 
PPS dominance 29 24 
GAMIN general activity .20 a 
PPS abasement —.15 —.20 
General effectiveness (I) with 
PPS dominance .83 .22 
GAMIN general activity .28 4 
GAMIN ascendance-submis- 27 a 
sion 
PPS abasement —.20 —.22 
PPS nurturance — .08 = 2h 
PPS deference .01 —.20 
Administrative skills with 
PPS dominance .30 .30 
PPS abasement —.12 —.24 
PPS nurturance —.03 —.22 
Interpersonal skills with 
PPS dominance 27 Br. 
PPS abasement —.07 — +20 
Intellectual ability with 
PPS abasement —.23 — 1 


Control of feelings with 
GAMIN ascerdance-submis- .28 ^ 
sion 


PPS dominance .26 .13 
PPS succorance —.23 —.12 
SATL total .20 .08 
PPS exhibition .06 .25 
PPS abasement —.00  —.20 
Work-oriented motivation with 
PPS dominance .28 .08 
PPS deference .22 —.17 
PPS achievement .21 .18 
Passivity with. 
GAMIN general activity —.43 s 
GAMIN ascendance-submis- — .35 » 
sion 
PPS dominance -27  —.29 
SATL total —.27 —.22 
PPS abasement .20 .28 
PPS intraception —.02 —.22 
PPS deference .04 .20 
Dependency with 
PS achievement —.29 .04 
PPS succorance .24 .04 
GAMIN general activity —.23 s 
GAMIN ascendance-submis- —.23 y 
sion 
PPS dominance —.22 —.04 
PPS abasement eal 05 
PPS nurturance :21 ll 
SATL total —.21 ll 
GAMIN inferiority feelings —.20 M 
PPS aggression —.20 . HT 
PPS deference .12 .22 
OQ authoritarianism .00 —,84 
OQ negativism —,00  —.28 
Nonconformity with 
OQ authoritarianism —.41 
OQ acquiesence —.29 
PPS succorance —.29 
PPS change 122. 
PPS abasement —.21 
PPS order .20 


* Not administered to entire sample. 
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naire results. Further studies will be neces- 
sary to ascertain whether this is so. In ad- 
dition, the reliabilities of some of the 
questionnaire variables may be relatively 
low, which could result in reduced esti- 
mates of the “true” correlations of the 
underlying variables. 


Relative Contributions 


To obtain perspective on the relative con- 
tributions made by the various assessment 
methods the correlations of each method 
with the staff judgments were examined and 
the highest correlations selected. These cor- 
relations are shown in Table 9. 

The purpose for selecting the highest cor- 
relations was to obtain indications of the 
maximum contributions made by each 
method to the evaluations. These correla- 
tions, of course, may be underestimates of 
the total variance accounted for by any one 
method but are reasonably indicative of the 
relative contributions. For each of the rat- 
ing variables the method correlating highest 
with it is italicized. 

The comparisons show that some of the 
methods contributed more than others to 
the staff evaluations. The simulations— 
group problems and In-Basket—show gen- 
erally higher correlations than the paper- 
and-pencil devices. Among the latter, the 
mental ability test shows up, on the aver- 
age, stronger than the personality question- 


naire. All the techniques, however, show a 
good correlation with at least one factor. 

The table shows also that the five tech- 
niques account for more of the variance in 
some factors such as Administrative skills 
(IV), Interpersonal skills (V), and Intel- 
lectual ability (VII) than in others like 
Control of feelings (VI), Work-oriented 
motivation (VIII), Passivity (IX), or De- 
pendency (X). It may be that some of the 
latter evaluations depend heavily on the 
interview or the projective tests. 

In order to ascertain the relative inde- 
pendent contributions of the more highly 
correlating methods to the overall judgment 
of the assessment staffs the methods selected 
were intercorrelated and multiple-correla- 
tion coefficients and regression weights 
against staff predictions for the two samples 
were computed. 

The four methods selected, along with 
the essential statistical information, are 
shown in Table 10. The correlations shown 
for the two group exercises are based on the 
observer ratings. SCAT 1A Verbal was se- 
lected as the measure of mental ability be- 
cause its correlations with the staff predic- 
tion were the highest of such measures. 

The four methods combined account for 
56% of the variance of the staff predictions 
in the college sample and 44% in the non- 
college sample. The regression weights vary, 
the Group Discussion and In-Basket having 


TABLE 9 
HIGHEST CORRELATION EACH ASSESSMENT METHOD WITH STAFF JUDGMENTS 


Staff 
pre- 


Factor 


dic- 
Hon CEDERE CIV. 


v VI VII VII IX x XI 


College graduates 
Manufacturing Problem 4 4 3 


Group Discussion 60 67 48 
In-Basket 55 60 76 
Mental ability test 36 47 37 
Personality questionnaire 29 33 30 
Noncollege 
Manufacturing Problem 42 60 51 
Group Discussion 47 57 44 
In-Basket 51 59 08 
Mental ability test 44 50 72 


Personality questionnaire 24 22 30 


-35 —18 19 
—39 22 31 
45 39 30 44  —18  —15 17 
22 2 7) 17 —16 -4 53 
27 98 -3 238 -48 -9 Al 
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TABLE 10 
INTERCORRELATIONS AND REGRESSION COEFFICIENTS 
1 2 3 4 5 Beta 
College graduates 
1. In-Basket — 17 .29 26 .55 37 
2. Manufacturing Problem ari — .40 16 41 17 
3. Group Discussion .29 .40 — 20 .60 .38 
4. SCAT 1A verbal 26 .16 .20 — .36 .16 
5. Staff prediction 55 ES -60 36 (R = .75) 
Noncollege 
1. In-Basket — .28 .25 .30 51 .33 
2. Manufacturing Problem .28 — .46 .22 .42 .19 
3. Group Discussion .25 .46 — .16 .38 Ali 
4. SCAT 1A verbal 30 .22 .16 — 44 27 
5. Staff prediction 51 42 .38 44 (R = .66) 


the greatest weights in the college sample 
and the In-Basket and SCAT Verbal in the 
noncollege sample. Each of the methods 
makes a unique contribution, however, to 
the predictions of the assessment staffs. 
The analysis also makes it clear that the 
three situational evercises had a major in- 
fluence on the judgments of the assessment 
staffs. The three combined account for 50% 
of the variance in the staff predictions for 
the college sample and 31% for the noncol- 
lege sample. In contrast the mental ability 
measure, SCAT Verbal, accounts for only 
6% and 12%, respectively, of the variance. 


PREDICTION OF PROGRESS 


Because the Management Progress Study 
has been in existence for only 9 years and 
because the participants in the Study were 


assessed over a 4-year span (1956-1960), it 
would be presumptuous to expect maximally 
discriminating criteria of progress in the 
management hierarchy to have become 
available yet. Furthermore, the assessment 
procedures used in the first telephone com- 
pany to participate in the Study (summer 
of 1956) were sufficiently revised so that 
the assessment data obtained in that com- 
pany are of little value for comparison pur- 
poses, Consequently, progress data spanning 
8 years or less were obtainable for the 
analyses to be described. 

In July 1965, five of the participating 
telephone companies submitted information 
on the progress made up to that date by the 
men in the Study. The data included the 
management level achieved and current 
salary. Table 11 summarizes these data, 


TABLE 11 
Proeress IN MANAGEMENT 


Sample N Year assessed 
A 54 1957 
B 83 1958 
€i 27 1959 
Cz 39 1959 
Di 19 1960 
Ds 22 1960 
E 25 1960 
Combined 125 1957-60 
Combined 144 1957-60 
Combined 269 1957-60 


Percentage at each 


Educational background management level (6/30/65) 
E 2 1 
College 43 55 2 
Noncollege 7 33 58 
College 15 18 7 
Noncollege 18 67 15 
College 32 63 5 
Noncollege 23 36 AL 
College 16 68 16 
College 30 64 6 
Noncollege 13 42 45 


College and noncollege 21 52 27 
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showing the sample. (coded), educational 
backgrounds of the participants, the year 
in which the men were assessed, and the 
numbers at each level. 

Levels 3 and 4 are the “middle-manage- 
ment” levels in the Bell System. Level 3 is 
the objective level for which those college 
graduates who are classified as management 
trainees were employed. They were ex- 
pected to achieve this level within a reason- 
able period of time (5 to 10 years). 

Approximately one-fifth (21%) of the 
men assessed who were still employed in 
1965 had achieved middle-management 
status. Variations between samples are 
marked, ranging from a high of 43% in A to 
a low of 7% in B. A slight majority (52%) 
of those assessed have achieved the second 
level of management, whereas a fourth 
(27%) are still at the first level. Again, 
variations between samples can be noted, 


ranging from 2% at the first level in A to 
60% in B. { 

The college graduates generally have 
progressed more rapidly than the noncollege 
men. This is not surprising since all but 
one or two of the college men were em- 
ployed as having middle management po- 
tential while considerably more of the non- 
college men were not so appraised by line 
management at the time of assessment. 
Whereas 30% of the college graduates have 
achieved middle management only 13% of 
the noncollege men have done so. Con- 
versely, 45% of the noncollege men are still 
at the first level of management while only 
6% of the college graduates have failed to 
achieve a higher level. f 

For each of the samples relationships be- 
tween management level obtained and as- 
sessment staff predictions (will achieve 
middle-management in 10 or less years) 


TABLE 12 
RELATIONSHIPS OF STAFF PREDICTIONS TO PROGRESS 


Sample middle management 


Staff prediction (will S 


Management level (1965) sass 
$4 2 1 (P) 
% % % 


BENE Addo ro oom E. ay rah ROADIE 


A Yes 38 
Noor? 21 

B Yes 20 
Noor? 63 

Ci Yes 11 
Noor? 16 

C: Yes 13 
Noor? 26 

Dı Yes 10 
Noor? 9 

D Yes 8 
Noor? 14 

E Yes 8 
Noor? 17 

Combined coll Yes 62 
n ea Noor? 63 
Combi lle Yes 41 
ombined noncollege Yer? 208 
All samples combined Yes 103 


Noor? 166 


58 39 3 .02 

19 81 0 

30 70 0 001 
0 21 79 

27 73 0 17 
6 81 13 

38 62 0 .08 
8 69 23 

50 50 0 09 

1 78 11 

24 38 38 wi 

21 36 43 

37 63 0 .08 
6 71 23 

48 50 2 001 

11 78 11 

32 6l 7 .001 
5 35 60 

42 54 4 .001 
7 51 42 
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are shown in Table 12. For analysis pur- 
poses the data were dichotomized (grouping 
first and second levels). Appropriate sig- 
nificance tests were then applied to deter- 
mine whether the observed relationships are 
statistically reliable. The chi-square test 
was used with Samples A and B and the 
combined samples while, with one excep- 
tion, Fisher’s exact test (Siegel, 1956) was 
applied to the remaining samples. For the 
exception, Sample D2, no statistical test was 
applied, because the relationship observed is 
inconsequential. 

Although the results vary from sample to 
sample, the most marked relationships being 
obtained for the two samples having the 
longest service in management since being 
assessed, they show that the assessment 
staffs were clearly able to identify those 
more likely to advance in their organiza- 
tions. Of the 55 men achieving middle man- 
agement, 43 (78%) were predicted correctly 
by the assessors. In contrast, of the 73 men 
who have not advanced beyond the first 
level of management the assessment staffs 
predicted that 69 (95%) would not reach 
middle management within 10 years. 

Insufficient time has elapsed for com- 
pletely evaluating the predictive accuracy 
of the assessment staffs. Because the predic- 
tions are for 10-year periods (to achieve 
middle management), it will not-be until 
1970 that an evaluation can be made on all 
of the men in the Study for the specified 
time period. By that year all of the men 
assessed will have completed at least 10 
years of service in management since being 
assessed. 


Specific Variables 


In interpreting the factorial analysis of 
the assessment variables it was noted that 
judgments of general effectiveness appeared 
to account for much of the variance in the 
staff ratings. It was also pointed out that 
the staffs were able to make many discrimi- 
nations on more specific variables. It is of 
interest to determine, therefore, whether 
general impressions or judgments on more 
specific variables are the more predictive 
of progress in management. 

The criterion measure used in making 
this analysis is salary progress (determined 


by taking the difference between salary on 
June 30, 1965, and salary at time assessed). 
This measure has the advantage for cor- 
relation purposes of being more discriminat- 
ing than management level. Furthermore, it 
is not as dependent as is current salary on 
previous salaries (in this instance on salary 
at time assessed). In the seven samples 
studied the correlations between manage- 
ment level and salary progress range from 
.38 to .84 with a median r of .71, indicating 
that despite the restricted range of levels 
the overlap between current level and sal- 
ary progress is substantial. 

The correlations between the derived 
"factor" scores (based on the assessment 
variables) and salary progress appear in 
Table 13. In addition, correlations for the 
situational exercises, ability tests, and per- 
sonality questionnaires are shown. 

Because the correlations for three of the 
samples (D1, D2, and E) seem low and er- 
ratic, interpretation of these data focuses 
on samples A through C2. The men in the 
latter samples have had at least 6 years of 
service in management since being assessed. 
Presumably the measure of salary progress 
for these samples is sufficiently stable to 
yield meaningful correlations with the vari- 
ous predictions used. 

Though the correlations of each of the 
judgment variables vary considerably 
across the four samples certain consistencies 
can be noted. For one, the overall ratings 
of the staffs on general effectiveness do in- 
deed have the highest correlations with sal- 
ary progress (median r of .48 compared to 
the next highest median of .39). Secondly, 
some more specific characteristics appear 
more important than others in predicting 
suecess in management. Thus, administra- 
tive and’ interpersonal skills, intellectual 
ability, lack of passivity and control of 
feelings appear to be more highly correlated 
with progress in management than do the 
other variables, particularly dependency 
which has relatively low correlations. 
Firmer conclusions on the relative impor- 
tance of individual characteristics can be 
drawn, however, when a more “optimal” 
criterion of progress in management be- 
comes available. 
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TABLE 13 


CORRELATIONS WITH SALARY PROGRESS 


Predictor variable 


A B [e 
Ws) aie) wm was 


Staff judgment 


General effectiveness (I) 41* 
Administrative skills 33* 
Interpersonal skills 26 
Control of feelings 34* 
Intellectual ability 48* 
Work-oriented motivation 16 
Passivity —30* 
Dependency —25 
Noneonformity 34* 
Situational exercises 
Manufacturing Problem 15 
Group Discussion 30* 
In-Basket 27* 
Ability Test 
SCAT verbal 36* 
SCAT quantitative 23 
SCAT total 38* 
Critical thinking in social science 26 
Contemporary affairs 35* 
Questionnaire 
Edwards PPS 
ach 20 
def —03 
ord —05 
exh —03 
aut —09 
aff —12 
int 14 
suc -14 
dom 26 
aba —32* 
nur -02 
chg 01 
end 02 
het 01 
agg 15 
Guilford-Martin 
G —03 
A 20 
M 17 
I 19 
N —07 
Attitudes toward life 08 
Opini i i 
p: "d questionnaire 02 
a —18 
08 


n 


* P less than .05 that r = -00. 


51* 
24 
36 
50* 
30 
20 
—33 
—25 
32 


41* 
50* 
—01 


51* 


—21 


19 
Sample 
Di Di E 
(N-19  (N-2) (N25) 
52* 24 13 34 
45* 32 -—11 24 
33* 29 28 40* 
32* 20 04 00 
07 30 —13 18 
41* 05 35 15 
—41* 21 —15 —40* 
01 07 24 07 
= 16 Gass 20 
50* 14 29 —01 
28 26 10 38 
22 03  —19 28 
30 19 —44* 14 
19 09 —10 —28 
28 18 —30 —08 
36* —02 —38 29 
—09 82. .—M 37 
25 —15 —28 —10 
—19 —06 02 42* 
21 15 39 —23 
25 —38 —21 17 
—07 01 —55* 04 
00 —25 16 18 
14 02 35 12 
—16 08 0 86-19 
—05 10 19 27 
11 —08 —11 07 
-14 —28 30 —01 
—11 13 —22 03 
03 15 39 00 
Eis dan eor —39 
01 13 -49  -08 
02 32 25 22 
12 35 22 09 
—05 14 13 06 
08 -10 16 —03 
18 —36 22 05 
—06 08 29  -0 
DA 222 12.. —25 
E —19 31 -14 
09 -0  -—925 —42* 
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Assessment Techniques 


That the various assessment methods also . 


vary in predictive accuracy is apparent 
from inspecting the data in Table 13. The 
situational exercises and the paper-and- 
pencil ability tests have higher correlations 
across the four samples than do the per- 
sonality questionnaires. 

Among the situational exercises the 
Manufacturing Problem has the higher cor- 
relations while the In-Basket has the lower 
ones. The SCAT, particularly the Verbal 
part, has the higher correlations among the 
ability tests. 

Because of the cost of assessment proce- 
dures a question could be raised regarding 
the gain obtained from using such proce- 
dures over the use of much simpler ones, for 
example, paper-and-pencil ability tests. 
Though the data in Table 13 do not provide 
a definitive answer to such questions they 
offer clues, 

The correlations of the assessment rat- 
ings, particularly the overall ones on gen- 
eral effectiveness, do tend to be higher than 
are the correlations for the more specific 
variables or for individual techniques. Sec- 
ondly, the correlations for the more elabo- 
rate situational exercises compare favorably 
to those of the ability tests. Finally, when 
mental ability, measured by a paper-and- 
pencil test, is partialled out of judged abil- 
ity reliable variance remains. 

The results of the latter analysis are 
shown in Table 14. For each sample the 
staff judgment variable and the ability test 
correlating highest with salary progress 
were selected. Partial correlations then were 
computed with the ability test being par- 
tialled out of the correlation between the 
staff judgment and salary progress. The t 


“statistic was used to test for the reliability 


of the partial correlations. 

In three of the four samples, reliable 
variance remains after partialling out the 
test scores. The results thus indicate that 
the assessment process does contribute more 
than can be gained by the simple adminis- 
tration of paper-and-pencil ability meas- 
ures. 


Discussion 


Though much research on the assessment 
process in the Management Progress Study 
remains to be done, the results so far ob- 
tained are informative regarding: 

1. The nature of the assessment staff 
evaluations. 

2. Contributions to staff judgments of the 
techniques employed. 

3. The “validity” of the evaluations. 

Previously reported research on the use 
of assessment methods has focused on the 
“predictive validity” of such methods. Rela- 
tively little information has been generated 
on the assessment process per se. 

In this regard recognition should be given 
the authors of the OSS (1948) report who 
place considerable emphasis on the entire 
assessment process. Much of their discus- 
sion, however, is either theoretical or de- 
scriptive. Relatively little data regarding 
the nature of the judgments made or the 
contributions of the various techniques em- 
ployed are presented. 

Many of the published studies do offer 
considerable information regarding the pre- 
dictive validities of the various techniques 
used in assessing their subjects. For that 
matter, despite contrary theoretical con- 
siderations, several investigators have 
treated the techniques on a par with the 


TABLE 14 
PARTIAL CORRELATIONS WITH SALARY PROGRESS 
A B Cı C: 
r — Partialr r Partial r r — Partialr r Partial r 
Staff judgment .48 .97 -51 .02 
.92* .39** .29 :42^* 
Ability test .38 46 51 .36 


* p less than .05 that r — .00. 
** p less than .01 that r = .00. 
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judgments of the assessment staffs. The 
study of Air Force officers (MacKinnon 
et al., 1958) can be cited as an example. In 
the report of this study the correlations be- 
tween over 600 predictors and several cri- 
teria are shown. Of the predictors, only a 
relative few reflect the judgments of the 
assessment staff. 


Nature of the Judgments 


From the factorial analyses of the ratings 
by the Management Progress Study assess- 
ment staffs it is apparent that the staffs 
were influenced considerably by their over- 
all judgments of the men assessed, particu- 
larly in evaluating potential for advance- 
ment in management. On the other hand, 
the staffs did make many intraindividual 
discriminations; the ratings reflect much 
more than “halo.” 

“Halo” in rating may reflect the inability 
of the raters to make discriminations among 
the various characteristics evaluated. The 
factorial results from the Study data, indi- 
cate, however that the judgments of the 
assessors were based on observed consisten- 
cies in the behaviors of the men assessed 
and on their judgment regarding the rela- 
tive importance for managerial potential of 
the various characteristics rated. This infer- 
ence is supported by the intercorrelations 
between evaluated performances in different 
techniques (see Table 10), by the varied 
loadings on the general factors (Appendix 
B), and by the fact that for each sample 
much of the variance in the staff ratings of 
potential is accounted for by the general 
factor. 

The fact that the assessment staffs made 
many discriminations among the character- 
istics rated is further evidence that the 
“halo” inferred from the factorial results is 
a resultant of other than rater error. Ac- 
tually, the factorial results do not com- 
pletely reflect the extent of the discrimina- 
tions made. Once the reliabilities of the 
rating variables are determined it will be 
possible to ascertain the uniqueness of each 
variable. 

Comparisons of the factorial findings for 
the Management Progress Study with the 
results of published studies are limited be- 


cause relatively few similar analyses have 
been reported, because methods in making 
such analyses have varied, and because 
variations in the numbers and kinds of 
variables on which the assessment staffs 
made judgments. In the OSS (1948) study 
four factors were obtained from 11 rating 
variables, Kelly and Fiske (1951) report 
nine first-order factors and five second- 
order from analysis of 42 variables. In the 
study of Air Force officers (MacKinnon et 
al, 1958) a cluster analysis of 30 variables 
yielded three clusters. Holt and Luborsky 
(1958) report three factors from each of 
two separate analyses of 20 variables. 

Furthermore, few attempts have been 
made to ascertain the extent of “halo” in 
the ratings. In the Holt and Luborsky 
(1958) analyses general factors do account 
for much of the variance obtained in each 
analysis. A median correlation of .42 be- 
tween the factors (oblique rotation) re- 
ported by the OSS (1948) suggests that 
much of the variance could have been ac- 
counted for by a general factor. 

Some effort has been made to ascertain 
the reliabilities of assessment staff judg- 
ments. Kelly and Fiske (1951), in particu- 
lar, report much evidence regarding such 
reliabilities. Though estimates of the reli- 
abilities of individual assessors in the Man- 
agement Progress Study are yet to be 
determined, the magnitudes of the commu- 
nalities obtained from the factor analyses 
indicate that the pooled ratings for many 
of the variables are reasonably reliable. 


Contributions of the Techniques 


The data reported make it apparent that 
the situational techniques (group exercises 
and In-Basket) used in the Management 
Progress Study produced, despite their com- 
plexities, reasonably reliable results. and 
that they markedly influenced the judg- 
ments of the assessment staffs. The paper- 
and-pencil instruments had less influence 
on staff evaluations generally, though they 
did influence them in many specific ways. 

The findings further indicate that neither 
kind of technique could have been omitted 
without loss of important information, All 
of the methods so far investigated appar- 
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ently contributed some unique information. 
A better evaluation of all the techniques can 
be made once studies on the interview and 
projectives have been completed. 

Economic considerations increase the im- 
portance of the findings regarding the situ- 
ational techniques. These methods are 
costly and time consuming to administer. 
The data presented appear to justify the 
costs entailed. 

Comparisons of the contributions to the 
staff evaluations made by the various tech- 
niques used in the Management Progress 
Study to results obtained by other investi- 
gators are restricted almost entirely to the 
OSS (1948) report. Many investigators, as 
has been noted, have reported the relation- 
ships of specific techniques to performance 
criteria, particularly supervisory ratings. 
Such data, however, shed little light on the 
assessment process. 

The OSS (1948) assessors viewed the 
clinical interview as the nucleus of their 
assessment program. The interviewer pre- 
pared for the interview by reviewing a com- 
pleted personal history questionnaire and 
the results obtained from a variety of 
paper-and-pencil tests. Subsequent to the 
interview he rated the assessee on several 
variables, thus facilitating eventual analysis 
of the data, and prepared a portion of a 
personality sketch of the individual for 
presentation at the assessment staff meet- 
ing. Consequently, it is hardly surprising 
that the interviewer ratings correlated 
highly with all of the assessment variables, 
being highest with 7 out of the 10 charac- 
teristics rated. 

The situational techniques also were ma- 
jor contributors to the OSS assessment 
process. A “situationist” presented another 
portion of the personality sketch prepared 
on each assessee at the staff meeting, along 
with recommendations of the staff team 
which administered the situational tests. 
Subsequent correlational analysis resulted 
in relatively high correlations with the staff 
ratings, though ranging considerably in 
magnitude. 

Very little data on the paper-and-pencil 
tests and none specifically on the projectives 
are given in the OSS report. Two mental 


ability tests correlated in the .60s with the 
staff rating on “Effective Intelligence.” 

Kelly and Fiske (1951) report a little 
data on relationships between paper-and- 
pencil scores and staff ratings. The correla- 
tions between such scores and the final rat- 
ing of overall suitability are generally low. 
A mental ability test correlates .36 with the 
rating whereas scores on a variety of inter- 
est and personality questionnaires range 
from —.21 to .25. 

From the limited amount of information 
available it is apparent that much more 
data on the contributions of various kinds 
of techniques to the assessment process 
would prove useful. For that matter, as- 
suming the original data to be available, 
analyses similar to those being carried out 
on the Management Progress Study data 
would contribute to a better understanding 
of the assessment process. 


Validity of Evaluations 


The end product of an assessment process 
is a series of evaluations by the assessors. 
The success of the activity depends upon 
the accuracy of these judgments. 

In the Management Progress Study the 
staff evaluations consisted of ratings on 25 
characteristics deemed relevant to the pur- 
poses of the study and overall evaluations 
of management potential and likelihood of 
remaining in the employ of the Bell System. 
Two kinds of evidence are presented in this 
report which bear on the accuracy of the 
ratings. 

The first kind can be considered "in- 
ternal.” It consists of factorial results and 
the correlations between scores for various 
techniques and the ratings. 

The second kind can be thought of as 
“external.” It has to do with the “predic- 
tive" validity (American Psychological As- 
sociation, 1954) of the ratings. It consists 
of the correlations and other data showing 
relationships between the evaluations and 
subsequent progress in management. 

The first kind of evidence is more sugges- 
tive than definitive. The factorial results, 
for example, make sense. The variables 
rated tend to load on the factors which one 
would expect them to, though the results 
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appear to be somewhat more reasonable for 
the college than for the noncollege sample. 

The correlations between the various 
techniques and the composite ratings based 
on the factorial results also appear reason- 
able. Again, however, the findings seem to 
be more consistent in some instances for 
the college sample than for the noncollege. 
Whether the homogeneity of the college 
sample or a better "scoring" of the factors 
generated from this sample contributes to 
the apparent discrepancies has not been 
determined. 

Some examples of the reasonableness of 
the correlations might be cited: 

1. The relatively high correlations of the 
In-Basket with administrative skills, of 
the group exercises with Interpersonal skills, 
of the mental ability tests with Intellectual 
ability, of GAMIN General Activity (nega- 
tively) with Passivity, and of the Con- 
temporary Affairs Test (positively) and 
Authoritarianism (negatively) with Non- 
conformity. 

2. The pattern of correlations of several 
personality measures with Dependency 
(college sample, though not for noncollege 
sample). 

More evidence, of course, would be de- 
sirable. Such evidence could come from re- 
finements in the factor scoring and from 
quantifying the projectives and the inter- 
views. 

The evidence for “predictive” validity, 
though incomplete, is much more precise. 
The criterion measures used reflect progress 
5 to 8 years subsequent to assessment. The 
criterion, though not “ultimate” (Thorn- 
dike, 1949), is of the kind referred to by 
Cronbach (1960) as “convergent.” As such, 
it would be expected to correlate with a 
theoretically “ultimate” criterion of man- 
agerial success. In the Study, of course, 
“progress” in management is a criterion 
worthy of investigation in its own right. 
Presumably, many of the characteristics 
assessed should correlate with actual prog- 
ress, as reflected by salary progress or ad- 
ministrative level. 

The relationships between the assess- 
ment results and progress presented in this 
report are restricted by the relatively short 


period of time involved. Only a fifth of the 
men on whom data were obtained had 
achieved the third level of management 
(“middle management" in the Bell System). 
Approximately half were still at the second 
level. Some of the latter will advance; 
others may not. A more discriminating 
measure of progress will thus become avail- 
able in a few years. 

Despite the restrictions in criterion 
spread, however, the predictions made by 
the assessment staffs are quite accurate, 
Approximately 80% of those who have ad- 
vanced to middle management were judged 
by the assessment staffs as having such 
potential. The predictions were even more 
accurate for those who have not advanced 
beyond the first level. Most of these men 
(95%) were judged as lacking in advance- 
ment potential. Evidently identifying the 
less adequate is easier than identifying 
those with more promise. 

The results of correlating the assessment 
ratings, based on the factorial results, with 
salary progress indicates that no single 
characteristic determines progress in man- 
agement. A composite of characteristics 
correlates higher across samples than do 
any of the more specific variables, The char- 
acteristics contributing most of the variance 
in the composite are administrative and 
interpersonal skills, intellectual ability, lack 
of passivity, and control of feelings. Work- 
oriented motivation has a lower correlation, 
while dependency, or lack of it, bears prac- 
tically no relationship to progress. 

The magnitudes of the higher correla- 
tions between the assessment ratings and 
salary progress compare favorably with 
similar correlations appearing in published 
reports. À few studies reporting correlations 
of greater magnitude have appeared (Oron- 
bach, 1960; Trankell, 1959). Correlations 
of approximately the same magnitude also 
have been reported (Albrecht et al., 1964; 
Cronbach, 1960). Several investigators have 
reported considerably lower correlations 
(Campbell, Otis, Liske, & Prien, 1962; 
Holt & Luborsky, 1958; Kelly & Fiske, 
1951; Kelly & Goldberg, 1959; MacKinnon 
et al., 1958; OSS, 1948). 

Comparisons of this type are, of course, 
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limited in value. The criteria have varied 
as have the assessment methods used and 
statistical methods for evaluating the re- 
sults. Furthermore, no standards exist for 
determining what could be considered an 
“acceptable” correlation. Predicting com- 
plex criterion behavior is an art that re- 
mains in the initial stages of development. 
It bears repeating, also, that the progress 
criterion used in the Study is a developing 
one. It may take several years before a 
stable and fully discriminating criterion of 
progress is obtained. When achieved, refine- 
ments in the criterion can be made (e.g., 
adjusting for departmental differences in 
rates of progress) and better estimates of 
the predictive accuracy of the assessment 
ratings determined. Furthermore, assum- 
ing “error” in both the ratings and the prog- 
ress criterion, analyses will be made to as- 
certain the locus and nature of such error. 
A final note regarding the “predictive 


validities" of the assessment methods used 
should be made, The situational exercises 
and the paper-and-pencil ability tests are 
predictive of progress in management 
whereas none of the personality question- 
naires correlate consistently with the cri- 
terion. Justification for the high cost of the 
assessment approach, moreover, can be ob- 
tained from the finding that the assessment 
ratings account for more of the variance in 
the progress criterion than do the simpler 
paper-and-pencil ability tests or, for that 
matter, than does any single method used. 

In conclusion, it should be noted that 
prediction of progress in management was 
only one purpose of the Management Prog- 
ress Study assessment centers. The other, 
and perhaps even more important purpose, 
was to provide a comprehensive picture of 
a fairly large number of young men as a 
base line for a study of the developmental 
changes of young adulthood. 
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MOTIVATION AND MEMORY 


BERNARD WEINER’ 
University of California, Los Angeles 


15 studies which examine the effects of motivation on memory are presented. 
It was demonstrated that the effects of motivation on retention are in part 
determined by the magnitude of incentive, quality of incentive, nature of 
the activity intervening between stimulus onset and recall, place in the mem- 
ory sequence at which the motivational factor is introduced, type of 
stimuli, and type of experimental design. It is suggested that research in the 
area may require both between-Ss and within-Ss experimental designs. Re- 
hearsal, repression, and action decrement are discussed briefly. 


f am studies reported in this monograph 
provide evidence concerning a deceptively 
complex question: “Does motivation affect 
retention?” Two literature reviews (Rapa- 
port, 1942; Weiner, 1966) have answered 
this question affirmatively. In support of his 
position, Rapaport cites the Lewinian stud- 
ies of the recall of completed and incom- 
pleted tasks, hypnotic phenomena and 
hypermnesia, recall of pleasant and un- 
pleasant experiences and the retention of 
stimuli associated with affective states, and 
the remembrance of traumatic eyents. 
Weiner also concluded that "there are’stud- 
ies which provide strong evidence that 
memory can be influenced by nonassocia- 
tive factors [p. 24]." The critical demon- 
stration experiments cited in' that review 
pertain to the recall of high-arousal stimuli 
(Kleinsmith & Kaplan, 1963; Walker & 
Tarte, 1963); retention of stimuli associ- 
ated with positive and negative incentives 
(Heyer & O'Kelley, 1949; Weiner & Wal- 
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ker, 1966) ; retention of affective material 
(Blum, 1961); and the recall of stimuli 
during heightened muscular tension 
(Bourne, 1955). (The reader is directed to 
Weiner, 1966, for a detailed analysis of 
these and related studies.) 

In this paper, 15 studies are presented 
which examine the effects of motivation on 
memory. The experimental procedure in 
these investigations is guided by current 
criticisms of retention studies (Keppel, 
1965; Underwood, 1954, 1964) and by a re- 
cent technique developed in the study of 
short-term memory (Peterson & Peterson, 
1959). Underwood and Keppel have been 
critical of retention studies because of the 
confounding of learning - with retention. 
Keppel (1965) illustrates this confusion 
with the following example: 


...Peterson, Peterson, and Miller (1961) found 
higher recall for words than for low meaningful 
nonsense syllables 6 seconds following presenta- 
tion. If it can be shown that differences in the re- 
call of these items were also present on an imme- 
diate retention test, it will not be possible to 
determine whether the differences in the delayed 
retention test are to be attributed to the effect of 
meaningfulness on learning (estimated by the im- 
mediate retention test), or over the retention in- 
terval, or to the action of meaningfulness on both 
learning and retention. [p. 7] 


The analysis by Underwood and Keppel 
lead Weiner (1966) to conclude that a num- 
ber of studies in the area of motivation and 
memory also are subject to methodological 
criticism. For example, the Zeigarnik. phe- 
nomenon often has been cited as supporting 
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a hypothesized motivation-memory linkage. 
Yet Caron and Wallach (1957) have 
demonstrated that differences in the recall 
of completed and incompleted tasks must 
be attributed to differences in the degree of 
original learning rather than to differences 
in retention. A similar criticism is appli- 
cable to investigations of the relations be- 
tween attitudes and retention (e.g., Levine 
& Murphy, 1943). A recent study by Fitz- 
gerald and Ausubel (1963) provides evi- 
dence that differences in recall as a function 
of attitude toward the content of a message 
are produced by differential learning of that 
message. 

In the research on memory reported here 
the degree of original learning between mo- 
tivational and nonmotivational conditions is 
equated, That is, the effects of motivation 
on retention are disentangled from the ef- 
fects of motivation on learning. Following 
previous suggestions (e.g., Cameron, 1947; 
Melton, 1963) memory is conceptualized 
as a multistage process. The initial period 
involves sensory or ideational registration 
and the subsequent fixation of that event. 
This is the stage of learning or trace forma- 
tion. In the second phase of the memory 
process the trace of the event is latent, yet 
potentially available for evocation. This is 
often referred to as the period of trace 
storage. In the final process of the sequence 
the trace is revived by the organism. This 
is the stage of trace evocation or trace re- 
trieval. Only the latter two stages, storage 
and retrieval, are adjudged to be memory 
processes. Studies of the influence of moti- 
vation on memory must be able to relate 
the motivational manipulation to changes 
occurring during either of these two stages. 

A second criticism of previous studies in 
the area of motivation and memory is that 
the experiences between the occurrence of 
the to-be-remembered event and subsequent 
recall are not controlled. This also can re- 
sult in a confounding of learning with re- 
tention. For example, some investigators 
have found that pleasant experiences are 
more likely to be retained than unpleasant 
experiences (cf. Meltzer, 1930). However, 
if the enhanced recall of pleasant experi- 
ences is produced by intervening rehearsal 
(verbal repetition), then the differential 


recall would be of little theoretical signifi- 
eance for the study of memory. Most psy- 
chologists would agree that the probability 
of recall of an event is a function of the 
number of repetitions of that event; this 
is a fundamental law of learning. There- 
fore, the behavior between the onset of a 
stimulus and subsequent recall must be 
controlled in experiments relating motiva- 
tion and memory, In the investigations de- 
seribed in this monograph a modification 
of the Peterson and Peterson (1959) tech- 
nique used in the study of short-term mem- 
ory is employed. In that procedure the be- 
havior of the individual between stimulus 
onset and stimulus recall is prescribed, and 
overt rehearsal is prevented. 

More uncertainty is conveyed by the 
term “motivation” than by the word “mem- 
ory.” Following Atkinson (1964), this 
writer considers any contemporary deter- 
minant of behavior to be a motivational 
variable. Hull (1952) and Spence (1956) 
include drive, habit, and incentive among 
the immediate determinants of action; 
Lewin (1938) conceptualizes behavior to 
be a function of tension, psychological dis- 
tance, and valence; Atkinson’s (1964) 
model of the determinants of behavior 
comprises motive, expectancy, and incen- 
tive. Motivational theorists therefore con- 
ceive behavior to be a function of properties 
of the organism (drive level, magnitude of 
tension, motive strength), attributes of the 
environment (valence or incentive of the 
goal), and an associative or learning factor 
(habit strength, psychological distance, ex- 
pectancy). Previous studies in the area of 
motivation and memory have related non- 
associative factors pertaining to the state 
of the organism (arousal level, attitude, 
motive, ete.) and/or characteristics of the 
environment (message content, affective 
tone, magnitude and quality of incentive, 
ete.) to recall (cf. Weiner, 1966). In the 
studies described in this paper the incentive 
offered for retaining a stimulus is varied. 
Incentives were selected as the main inde- 
pendent variable because of the relative 
ease and experimental feasibility of manip- 
ulating their quantity and quality. It is be- 
lieved that the general pattern of results 
emerging from this experimentation will be 
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replieable with other motivational manipu- 
lations. That is, the data are assumed to 
reflect general relationships between moti- 
vation and memory. 

The reader must be forewarned that the 
purpose of this research is not merely to 
ascertain whether motivation influences 
memory. The existence of this effect was 
decided in an earlier paper (Weiner, 1966). 
It is hoped that the research will result in 
a specification of the conditions under which 
this relationship can be expected to hold. 
Frequently the investigator is able to iso- 
late a relevant condition, yet the finding is 
not investigated in detail. It is evident that 
much work remains to be done after an 
initial demonstration. However, this pro- 
gram of research is directed to the exposure 
of a variety of factors affecting the rela- 
tionship between motivation and memory, 
rather than to the systematie understand- 
ing of any individual determinant. To 
provide eontinuity and to convey the evo- 
lution of the research program, the ex- 
periments are reported in their historical 
Sequence of occurrence. Often a problem 
appearing earlier in the program is tem- 
porarily put aside or not discovered until a 
later investigation. 


ExPERIMENT I 


The initial investigation of the series was 
conducted by Weiner and Walker (1966). 
Inasmuch as that study serves as the proto- 
typical experiment, the experimental pro- 
cedure will be repeated here in detail. In 
the discussion of later experiments only the 
modifications of that procedure will be 
reported, 


Method 


Subjects. Twenty male students enrolled at the 
ACT of Michigan participated as paid sub- 
jects. 

Materials. Eighty consonant trigrams of less 
than 30% associative strength (Witmer, 1935) were 
used as stimuli. The consonants “v” and “w” were 
excluded; each of the remaining 18 consonants 
was used in no less than 10 and no more than 
15 of the trigrams. The trigrams were printed 
on slides with one of four background colors: red, 
yellow, green, or white. Twenty stimuli were ran- 

omly assigned to each color. 

Procedure. The subjects participated in a short- 
term memory task. On each trial the background 


color on which the trigram appeared informed the 
subjects of the incentive for correctly remember- 
ing the stimulus. There were four experimental 
conditions corresponding to the four colors: win 
one cent for correctly recalling the stimulus, win 
five cents, receive a shock for not correctly re- 
ealling the stimulus, and a control condition in 
which neither shock nor money was used as an in- 
centive. The intensity of the pulse shock was 110 
volts with an amperage of 60 microamps. The 
shock was delivered to the upper arm of the sub- 
ject. 

Subjects were first informed of the color-value 
pairings. To ensure that the value of each color 
was retained, the subjects were administered a 10- 
trial, 4-item paired-associates list consisting of the 
color-incentive pairs. The experimenter read the 
colors aloud; subjects were corrected following 
wrong responses. There generally were no incor- 
rect anticipations following the third trial. 

In the short-term memory task which followed, 
the to-be-remembered stimuli were projected on 
a screen for .75 second. Then slides containing 
random single digits were projected. There were 
60 digits on each slide. The interslide interval was 
approximately .70 second. As an interpolated ac- 
tivity subjects were required to read the digits in 
time to a metronome which beat 3.25 times per 
second. There were two time intervals for the in- 
terpolated activity: 4.67 seconds and 15 seconds. 
Therefore, approximately 15 or 49 digits were read 
during the interpolated time period. Both the tri- 
grams and the digits were read aloud. The recall 
time allowed between the offset of the digits and 
the onset of the next stimulus was 12 seconds. Re- 
call was cued by the appearance of a blank slide on 
the screen. Following Trials 20, 40, and 60 there 
was a 10-second time delay to change slide trays. 

Subjects were in all experimental conditions, In 
the first and last 40 trials each of the eight con- 
ditions (four incentive conditions X two time 
intervals) appeared five times. Within these 40 
trials the order of presentation was randomized, 
The order of the stimuli was constant across sub- 
jects, and for all subjects the incentive value as- 
sociated with a given color remained the same 
during the experiment. The design was counter- 
balanced so that every color-trigram pairing was 
associated with each of the incentive conditions an 
equal number of times. Subjects were randomly 
assigned to the various color-incentive combina- 
tions. 


Results 


In Figure 1 the percentage of correct re- 
sponses is plotted as a function of incentive 
and the time of the interpolated activity. 
The analysis of variance performed on this 
data reveals that there is a significant main 
effect attributable to the incentive condi- 
tion, F(3, 57) = 6.94, p < .01, and a sig- 
nificant interaction between the incentive 
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condition and the time of the interpolated 
activity, F(3, 57) = 474, p < .01. Com- 
paring retention within a time interval 
with a Newman-Keuls paired-mean test 
shows that at the shorter interval there are 
no significant differences between any of 
the means. At the 15-second time interval 
retention is signifieantly better (p « .01) 
in the five-cent and shock conditions than 
in either the one-cent or control conditions. 
Discussion 

„The results support the general hypothe- 
sis that motivation affects retention. The 
study demonstrated that the recall of an 
event is in part a function of the anticipated 
outcome signaled by that event. Differ- 
ential rates of forgetting were exhibited 
when the to-be-remembered stimuli were 
identical in the different conditions. This is 
important because Underwood (1964) has 
argued convincingly that the degree of 
original learning produced by intrinsic dif- 
ferences in the to-be-retained stimuli, for 
example, the number of units in the mate- 
tial, has been confounded with differences 
in retention, Further, recall did not signifi- 
cantly differ between conditions at the 
shorter time interval, and varied between 
80-90%. It therefore is extremely unlikely 
that the differences in recall at the 15-sec- 
ond interval can be attributed to differences 
in the degree of original learning. The inter- 
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action between the time of the interpolated 
activity and the incentive conditions in- 
dicates that the storage of the trace is the 
process affected by the motivational manip- 
ulation employed in this study. 

The purpose of this research is to isolate 
a number of factors which affect the rela- 
tionship between motivation and memory. 
Two factors which in part influence memory 
have been identified in this investigation: 
the magnitude of an incentive (penny 
versus nickel), and the type of incentive 
(aversive and positive). 


Experiment II 


In the Weiner and Walker study, feed- 
back was conveyed only when administer- 
ing or withholding the shock. Conceivably 
the difference in recall between the shock 
and nonshock conditions could be at- 
tributed to differential knowledge of re- 
sults (KOR). In this study further controls 
were instituted by providing KOR after 
every response. Following each response the 
experimenter said: “No penny” or 
"Penny"; “No nickel’ or “Nickel”; “No 
shock,” or shock was administered; or 
“Wrong” or “Right” in the appropriate 
condition. Other methodological changes in- 
cluded the substitution of purple for white 
as a color cue, and an extension of the short 
time interval from 4.7 seconds to 5.6 sec- 
onds. 

Subjects were 20 male students enrolled 
in the introductory psychology course at the 
University of Minnesota. 


Results 


In Figure 2 the percentage of correct re- 
sponses is plotted for the four incentive con- 
ditions and the two time intervals. An anal- 
ysis of variance reveals that there is a 
signifieant main effect due to the experi- 
mental conditions, F(3, 57) = 5.81, p < 
.01. The interaction between the incentive 
and time interval does not reach statistical 
significance, F(3, 57) = 1.44, p « 25. Al- 
though this interaction is not significant, 
paired-mean tests are reported to allow 
further comparisons between the findings 
in Experiments I and II. A Newman-Keuls 
paired-mean test indicates that there are 
no significant differences in recall between 
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Fia. 2. Percentage retention at two time in- 
tervals as a function of incentive, employing 2 
degrees of monetary reward and feedback after 
each response. 


any of the conditions at the short time in- 
terval. At the long interval, recall of stim- 
uli associated with shock is significantly 
greater (p < .05) than recall in the other 
three incentive conditions. There are no 
significant differences in recall at the long 
interval between the penny, nickel, and 
control conditions. 


Discussion 


. The results partially replicate the find- 
Ings in the Weiner and Walker experiment. 
In the present study, as in the previous in- 
vestigation, the stimuli paired with shock 
were recalled more often than stimuli as- 
sociated with a one-cent reward and stim- 
uli for which neither shock nor money were 
at stake. However, this investigation failed 
to replicate the differences in recall between 
the penny-nickel and control-nickel con- 
ditions which were found previously. 

The significant, differences in recall be- 
tween the shock and control conditions es- 
tablish that differential feedback cannot 
account for the unequal rates of forgetting 
in those conditions. Further, the equality 
1n retention at the short time interval, 
which varied between 78-85%, and the di- 
vergent decay rates found for identical tri- 
grams again strongly suggest that height- 
ened motivation during the perception of 


stimulus affects the subsequent availability 
(storage) of that stimulus. 

The failure to replicate the findings in the 
Weiner and Walker study concerning the 
effectiveness of the positive reward is some- 
what puzzling. In the Weiner and Walker 
study the nickel reward was as potent a 
motivator as was the shock; in Experiment 
II the anticipated reward had only a small 
influence on retention. Analysis of the sub- 
jects participating in the two experiments 
provides one possible clue to explain the 
conflicting results. In the Weiner and Wal- 
ker study the subjects were paid volunteers, 
part of a permanent pool of paid subjects 
at the University of Michigan. Their pri- 
mary source of motivation to participate 
in experiments is monetary. In Experiment 
II the subjects were students enrolled in in- 
troductory psychology at the University 
of Minnesota. Their primary source of mo- 
tivation to participate in experiments is 
class credit. It is likely that in this situa- 
tion money is a more salient and effective 
motivator for the former than the latter 
subjects. That is, there may have been an 
interaction between the type of incentive 
and the motivations of the subjects. 


ExPERIMENT III 


In the previous two studies the magni- 
tude of the positive incentive was varied. 
In Experiments III and IV only one mone- 
tary reward was employed, while the 
strength of the aversive shock was manipu- 
lated. The four incentive conditions were: 
win five cents for correctly recalling the 
stimulus; receive a small shock following 
incorrect recall; receive a larger shock fol- 
lowing incorrect recall; and a control con- 
dition. The peak voltage of the smaller 
shock was in the order of 175 volts, while 
the larger pulse shock approximated 250 
volts. Both shock intensities decayed to 
near zero after 3 milliseconds. 

Inasmuch as differential KOR did not 
account for the divergent decay rates in the 
previous experiments, only the feedback 
conveyed by the shock was used. This pro- 
cedure minimizes experimenter-subject in- 
teractions. Subjects were 24 male students 
enrolled in introductory psychology at the 
University of Minnesota. 
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Fre. 3. Percentage retention at two time in- 
tervals as a function of incentive, employing two 
levels of shock intensity. 


Results 


In Figure 3 the percentage of correct re- 
sponses over the two time intervals is 
plotted. An analysis of variance indicates 
that the main effect does not reach statisti- 
cal significance, F(3, 69) = 2.72, p < .10, 
while the Time X Incentive interaction does 
not approach significance, F(3, 69) < 1. 
Further analysis reveals that all three mo- 
tivational conditions differ significantly 
from the control group (p < .01), but not 
from one another. 


Discussion 


The results contrast with the previous 
findings in that the differences between the 
motivational and nonmotivational condi- 
tions at the short time interval are as great 
as the differences exhibited at the long in- 
terval. No definitive conclusions concerning 
retention can be drawn from this study. 
However, it is of interest to note that the 
recall of stimuli in the low shock intensity 
condition is almost identical with the re- 
call in the high intensity condition. 


ExPERIMENT IV 


Experiment III was attempted again with 
some procedural modifications. It was 
thought that the absence of differences in 
recall between the shock conditions in Ex- 
periment III may have been caused by a 
failure to discriminate the smaller from 
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the larger shock. In the present study the 
two intensities were differentially raised: 
the intensity of the smaller shock was in 
the order of 200 volts, while the larger shock 
approximated 300 volts. Also, the retention 
intervals were altered, and a third recall 
point was added. The three time intervals 
were: 1.87 seconds, 7.50 seconds, and 17.0 
seconds. Prior to the test trials six practice 
trials were administered. There were 72 
test trials, 6 for each of the 12 (four in- 
centives X three time intervals) experi- 
mental conditions. In the first and last 36 
trials each of the 12 conditions appeared 
three times; within each of the 36 trials 
the order of presentation was randomized, 
Subjects were 24 male students enrolled in 
introductory psychology. 


Results 


In Figure 4 the percentage of correct re- 
sponses is plotted for the four incentive 
conditions and the three time intervals. An 
analysis of variance indicates that there is 
a significant main effect attributable to the 
incentive, F(3, 69) = 4.70, p < 01. The 
Incentive x Time interaction approaches 
significance, F(6, 138) = 1.87, p < .10. A 
Newman-Keuls paired-mean test was per- 
formed to compare the results with earlier 
findings. The test reveals that there are no 
significant differences in recall at the short 
or intermediate time intervals, At the long 
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tervals as a function of incentive, employing two 
levels of shock intensity. 
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interval recall of stimuli associated with 
either the larger or smaller shock is signifi- 
cantly greater than the recall of the control 
stimuli (p < .01) and stimuli linked with 
a five-cent reward (p « .05). 


Discussion 


As in the prior experiments, retention 
which was instrumental to the avoidance of 
shock was enhanced. However, the magni- 
tude of the shock was not related to recall, 
thus replicating the finding of Experiment 
II 


The decay rate exhibited by the stimuli 
paired with a five-cent reward is interesting. 
At the intermediate time interval these 
stimuli were retained as well as the stimuli 
paired with shock, and recall was well 
under 100%. As the time interval length- 
ened, the anticipated five-cent reward did 
not enhance recall, replieating the results 
of Experiment II. This writer can offer no 
explanations for some of the undiscussed 
differences found between Experiment III 
and Experiment IV (e.g., the differences in 
recall at the short interval found in Experi- 
ment III but not in Experiment IV). It also 
Should be noted that in all conditions the 
recall at the short time interval approxi- 
mated 10096. Because prior studies in this 
series indicated that differential recall was 
not to be attributed to differences in origi- 
nal learning, no attempt was made to keep 
recall at the short interval at or below 80%. 

The results do indicate another factor 
which must be specified when investigating 
the effects of motivation on retention. Pre- 
viously it was stated that the magnitude 
and type of incentive in part determine re- 
tention, This must now be modified: only 
Mm certain cases will the magnitude of the 
Incentive be a relevant variable. That is, 
there is an interaction between the effects of 
the magnitude and type of incentive in- 
fluencing recall. 


EXPERIMENT V 


In Experiments I-IV the motivational 
manipulation involved the magnitude of 
positive and negative incentives. The focus 
of investigation considerably shifts in Ex- 
periment V. To account for the differential 
decay rates exhibited between motivational 


and nonmotivational conditions, Weiner 
and Walker (1966) suggested that the 
greater the motivation during the perception 
of an event, the less the likelihood that the 
trace of that event would be subject to 
retroactive interference. An earlier study by 
Prentice (1943) had found that differences 
in retention of material learned under high 
and low motivational conditions are maxi- 
mized when subjects are asked to recall 
following an interfering activity. Conse- 
quently, it is hypothesized that motiva- 
tional factors will have the greatest oppor- 
tunity of manifesting their influence when 
interference is maximal. That is, motiva- 
tional factors are expected to be most effi- 
cacious under conditions which maximize 
forgetting. 

In Experiment V, conditions were estab- 
lished to increase the amount of forgetting 
exhibited in the earlier studies, The in- 
centives in this experiment are the same as 
those in Experiments I and II, which em- 
ployed 2 degrees of positive incentive. Dur- 
ing the interpolated time interval the sub- 
jects were required to read pairs of digits, 
add them, and state whether the total was 
odd or even. A metronome striking two 
beats per second paced this activity. Posner 
and Rossman (1965) previously demon- 
strated that this procedure greatly reduces 
retention. 

Other facets of the experimental design 
were identical with those of Experiment 
IV. In this and all future experiments there 
are three recall intervals. Subjects were 16 
male students enrolled in introductory psy- 
chology. 


Results 


Figure 5 shows the percentage of reten- 
tion at the three time intervals for the four 
incentive conditions. An analysis of vari- 
ance reveals that there is a significant main 
effect attributable to the incentive, F (3, 45) 
= 15.12, p < .01, and a significant In- 
centive X Time interaction, F(6, 50) = 
5.65, p < .01. A Newman-Keuls paired- 
mean test shows that at the long and inter- 
mediate time interval stimuli associated 
with shock are recalled significantly more 
often than the control stimuli (p < .01). At 
the intermediate interval stimuli paired 
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Fia. 5. Percentage retention at three time in- 
tervals as a function of incentive, employing 2 
degrees of monetary reward and a more difficult 
interpolated activity. 


with shock also are recalled significantly 
more than stimuli paired with one-cent re- 
ward (p < .01). 


Discussion 


For the three identical incentive condi- 
tions in Experiment IV and Experiment V 
the respective total mean recall at the in- 
termediate interval was 71% and 63%; at 
the long interval recalls were, respectively, 
50% and 33%. The interpolated activity in 
Experiment V clearly had more detrimental 
effects on retention than the interpolated 
task employed in Experiment IV. In Ex- 
periment V the difference in the percentage 
of recall between the shock and control 
stimuli at the long time interval was 22%; 
in Experiment IV this difference was 16%. 
The respective differences in retention for 
the shock versus nickel condition at the long 
time interval were 15% and 12.5%. At the 
intermediate time interval the differences 
between the recall of shock and control 
stimuli were 24% in Experiment V and 10% 
in Experiment IV; differences in the recall 
between shock and nickel stimuli at the 
intermediate interval were 9% in Experi- 
ment V and —1% in Experiment IV. Hence, 
the difference between the retention of stim- 
uli associated with potent motivational 
factors as opposed to control stimuli or 
stimuli associated with a less powerful mo- 
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tivator is greater under conditions which 
minimize total retention. These results tend 
to support the hypothesis that motivational 
factors will be most salient given conditions 
which maximize forgetting. 

This hypothesis, however, is not sup- 
ported when restricting the data analysis to 
Experiment V. In that experiment there was 
no interaction exhibited at the intermediate 
and long time interval between the shock 
and control stimuli. That is, the difference 
in retention between the shock and control 
stimuli did not increase as the total amount 
of retention progressively decreased. 

A better procedure for the above compari- 
sons would be to conduct two (or more) 
studies which vary only the difficulty of the 
interpolated activity. However, the general 
pattern of results in Experiments I-V do 
suggest that the influence of motivation on 
retention is in part a function of the magni- 
tude of incentive, type of incentive, and 
type (difficulty) of experience intervening 
between stimulus onset and recall. 


ExrermeNTS VI-VIII 


In the preceding five experiments the in- 
centive cue was presented simultaneously 
with the onset of the stimulus. The tem- 
poral sequence of events was: 


incentive cue 
stimulus 


l 
interpolated activity 


recall 


Therefore, the motivational factors were 
introduced during the period of trace for- 
mation. In the following experiments the 
temporal locus of the motivational manipu- 
lation is altered so that it cannot influence 
the strength of the original association. 
The two additional experimental para- 
digms in Experiments VI-VIII are: 


stimulus 


J 
interpolated activity 
(a) 
incentive cue 
recall 
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stimulus 


interpolated activity 
(b) 
incentive cue 


l 


recall 


Thus, the motivational manipulation is exe- 
cuted during the periods of trace storage 
and trace retrieval. Introducing the in- 
centive cue simultaneous with stimulus 
onset or during the interpolated activity 
theoretically might influence the course of 
trace decay during the storage period. 


EXPERMENT VI 


The procedure combines various condi- 
tions used in prior experiments. Two degrees 
of monetary incentive (Experiments I, II, 
V), three time intervals (Experiments IV, 
V), and the normal interpolated activity 
(Experiments I-IV) were used. All stimuli 
are projected on a blank background. Dur- 
ing the recall interval the color cue is pro- 
jected. The temporal sequence of events 
therefore is: 


stimulus 
interpolated activity 


incentive cue 
recall 


Subjects were 16 male students enrolled 
in introductory psychology. 


Results 


. The results indicate that there are no 
significant differences between the recall of 
stimuli as a function of the incentive con- 
dition, F < 1. 


Exprrment VII 


To increase the amount of motivation to 
retrieve a stimulus the procedure was modi- 
fied to include only two conditions. In one 
condition recall was rewarded with five 
cents while nonrecall was punished with a 
shock. In the second condition neither shock 
nor money was associated with the outcome. 
Other aspects of the procedure were identi- 
cal with Experiment VI. 


Results 


There are again no significant differences 
between the recall of stimuli in the two 
motivation conditions, F < 1. To indicate 
the equality of recall, there are 21.4% in- 
correct responses in the control condition 
and 23.1% incorrect responses in the in- 
centive condition. t 


ExrermeNT VIII 


It was noted that some subjects tend to 
emit their responses during the interslide 
interval, that is, before the onset of the 
incentive cue. Therefore the cue was 
brought forward in the temporal sequence of 
events and presented prior to the recall 
period: 


stimulus 
interpolated activity 
incentive cue 


recall 


The cue was on for 7.5 seconds, and the 
recall period was an additional 6 seconds. 
There were 60 randomized trials, 10 for 
each experimental condition (three time 
intervals X two incentive conditions). All 
other procedures were identical with Experi- 
ment VIT. 


Results 

As in Experiments VI and VII, there are 
no significant differences in recall as a fune- 
tion of the incentive conditions, F < 1. 


Discussion 


At this point the data indicate that pre- 
senting the motivational source during or 
immediately prior to recall does not en- 
hance retention. Therefore, the effects of 
motivation on retention are dependent upon 
the magnitude and type of incentive, nature 
of the experience intervening between stim- 
ulus onset and recall, and the memory proc- 
ess accompanying the motivational input. 

A direct comparison between the effects 
of money and/or shock presented during 
the time of learning and during the period 
of retrieval is possible by comparing recall 
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in Experiment IV with recall in Experi- 
ment VI. These studies employed identical 
shock, nickel, and control conditions; three 
time intervals for recall; and the same in- 
terpolated task. However, the motivational 
cueing occurred at different places in the 
memory sequence. (Although Experiment 
VII combined the shock and nickel condi- 
tions, the data were very similar to the 
results in Experiment VI; Experiment VII 
therefore is included in the following com- 
parison. Experiment VIII is not considered 
because the procedural change, unex- 
pectedly, dampened total recall.) In Experi- 
ments IV, VI, and VII the recall of stimuli 
associated with shock was extremely con- 
sistent, varying between 76% and 77%. 
Similarly, recall of stimuli associated with a 
five-cent reward varied only between 73% 
and 77%. On the other hand, in Experiment 
IV there was 67% recall of control stimuli, 
while in Experiments VI and VII recall of 
the control stimuli was respectively 78% 
and 79% (see Figure 6). 

Two very different interpretations of the 
data shown in Figure 6 are offered. It may 
be that the differences in retention ex- 
hibited in Experiments I-IV (cue at onset 
of stimulus) are not due to an enhanced 
retention of stimuli associated with positive 
or negative incentives. Rather the differ- 
ences in recall are to be attributed to a 
relative decrement in the retention of the 
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Fie. 6. Average percentage retention of con- 
irol and shock stimuli in Experiment IV (cue at 
stimulus onset) and Experiments VI and VII 
(cue during trace retrieval). 


eontrol stimuli. An alternative explanation 
is that in Experiments VI-VIII the individ- 
uals acted as if all the stimuli were associ- 
ated with an incentive until the motiva- 
tional cue actually was presented. That is, 
the subjects responded to the onset of the 
stimulus with heightened motivation be- 
cause the recall of the stimulus might have 
a motivational consequence. Given this 
interpretation, Experiments VI-VIII are 
somewhat analogous to a situation in which 
the subject is confronted with a random 
partial reinforeement schedule. While en- 
gaging in the instrumental behavior (stor- 
age) it is not possible for the organism to 
predict the actual outeome; the individual 
therefore behaves identically in the motiva- 
tional and “nonmotivational” conditions. 
This interpretation suggests that there were 
no "control" stimuli in Experiments VI- 
VIII. The sequence of events would be 
portrayed as: 


(a) Motivational condition 
motivational manipulation 


l 


stimulus 


interpolated task 
J 
incentive 
recall 


(b) Control condition 
motivational manipulation 


l 


stimulus 


interpolated task 
$ 


recall 


The conclusion from experiments cueing 
during recall would continue to be that 
motivational manipulations during the re- 
trieval process do not enhance recall, How- 
ever, the possible effect could have been 
dampened by the sequentially prior ex- 
pectation of an incentive. This interpreta- 
tion implies that the differential recall ex- 
hibited in Experiments I-V is attributable 
to an absolute enhancement in the recall of 
stimuli paired with a motivational variable. 

The analysis presented in the above 
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paragraph indieates that the methodology 
employed in the prior experiments might be 
inadequate to ascertain the effects of moti- 
vation on memory when cueing occurs dur- 
ing trace retrieval. To investigate this 
problem a one-trial experiment must be 
conducted in which the individual is not 
aware of a potential motivational outcome 
prior to the period of trace utilization. Ex- 
periment IX creates these conditions. 


ExrERIMENT IX. 


Subjeets were 164 male students enrolled 
in the introductory psychology course. The 
experiment was administered on four occa- 
sions in one evening to groups ranging in 
size from 20 to 80. Subjects were asked not 
to divulge the nature of the experiment to 
the incoming groups. 

A three-page booklet was randomly dis- 
tributed to the subjects; the first page of 
the booklet was blank. Subjects silently 
read the directions printed on page 2 of the 
booklet. Two groups, produced by two 
types of instructions, were created: an 
Intentional Learning group and an Inci- 
dental Learning group. Their respective in- 
structions were: 


(a) You will be seeing a series of words on the 
screen. Following the list presentation you will be 
asked to recall the words, Pay close attention to 
the words as they are flashed. 

(b) You will be seeing a series of words on the 
screen. These are practice trials to familiarize you 
with the experimental procedure which we will be 
using in the actual experiment which follows. 


Prior evidence that motivational factors in- 
fluencing recall are most effective when for- 
getting is maximized suggested that there 
might be an interaction between degree of 
original learning and subsequent trace re- 
trieval in motivational and nonmotivational 
conditions. For this reason two groups ex- 
pected to differ in their original learning 
were created. 

Thirty-six nouns were then projected on 
a screen. Each stimulus word was visible for 
8 seconds, with a .75-second interslide inter- 
val. Twelve words were classified as AA 
on the Thorndike-Lorge (1944) word count, 
12 appeared 30-40 times per million, and 
12 appeared once per million. The 36 stimuli 
were presented in a randomized order. 


Following the presentation of the stimuli 
subjects silently read the instructions on the 
next page of the booklet. Three groups were 
created by varying the quality and magni- 
tude of the motivation aroused following 
stimulus presentation. For one group, 
money was offered as an incentive for 
stimulus recall; for the second group, 
achievement motivation was aroused (Mc- 
Clelland, Atkinson, Clark, & Lowell, 1953) ; 
a third group was a control group. The re- 
spective directions for these groups were: 


Now write down the words which were pre- 
sented on the screen. The order of recall is not 
important. Just write them down as they occur to 
you. You can guess if you are uncertain. You will 
have five minutes for this task. 

(a) For every word correctly recalled you will 
win five cents. You can, therefore, win almost 
$2.00. You will be paid immediately following the 
experiment. 

(b) We have found in the past that the ability 
to remember words is related to general success 
on exams. So try your best so that your perform- 
ance reflects your ability. 

(c) When the five minutes are up, we will col- 
lect the booklets. Remain seated during the entire 
five minutes. 


There were six experimental groups (two 
learning conditions X three retrieval condi- 
tions). It was hypothesized that the In- 
tentional Learning group would recall more 
than the Incidental group, the Achievement 
and/or Monetary Reward group would re- 
call more stimuli than the control group, 
and that there would be an interaction be- 
tween the degree of original learning and 
the motivational condition at the time of 
retrieval. 


Results 


An analysis of variance yielded the ex- 
pected difference in recall between the 
groups which differed in their instructions 
prior to learning, F (1, 158) = 8.51, p < .01. 
However, there were no significant differ- 
ences in recall between the groups which 
differed in strength of motivation during 
retrieval, F < 1, nor any evidence for the 
expected interaction, F < 1. 


Discussion 
The results support the previous conclu- 


sion that motivational input during the pe- 
riod of trace retrieval does not enhance re- 
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call. This finding is somewhat disconcerting 
because other investigators (e.g, Blum, 
1961; Bourne, 1955) have established that 
motivational manipulations at the time of 
retrieval do affect recall. At the present 
time the investigator cannot account for 
this contradiction in results. The great dif- 
ferences between the experiments of 
Bourne, Blum, and these studies do not per- 
mit critical methodological comparisons. 


Experiments X-XV 


There remains the perplexing analysis 
which suggested that the differences in re- 
call found between conditions in Experi- 
ments I-V are not due to an enhancement 
of the recall of stimuli in the motivational 
condition. Rather, they seem to be caused 
by a relative decrement in the recall of the 
control stimuli. Before considering the the- 
oretical implications of this finding, it is 
necessary to determine conclusively the 
facts. Experiments X-XV reveal that this 
is a more difficult problem than one would 
anticipate. 


EXPERIMENT X 


Experiment X was conducted simultane- 
ously with Experiments VI-VIII as part of 
a master's thesis by Kernoff (1965; reported 
in Kernoff, Weiner, & Morrison, 1966). The 
study is not directly related to the prob- 
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stimuli (quadrigrams), and a more sensitive re- 
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lems discussed above, but the findings lead 
to some procedural changes employed in 
Experiments XI-XV. 

Studies I-V are conceivably subject to 
some methodological criticisms. Differences 
in learning in those investigations may 
have been masked because performance at 
the short interval was comparatively near 
asymptote and a relatively insensitive 
measure of learning was used (cf. Under- 
wood, 1964). Experiment X attempts to 
replicate the basic findings of Weiner and 
Walker, employing two methodological 
changes. First, four-letter consonant stimuli 
(quadrigrams) rather than trigrams were 
the to-be-remembered units. The stimuli 
were formed by adding a consonant to the 
trigrams. The consonant was not repeated 
in the trigram, and “v” and “w” were ex- 
cluded. The stimuli were projected on the 
screen for 1 second. A second change was 
instituted to increase the sensitivity of the 
response indicator. Hach response was 
evaluated on an eight-point scale of ap- 
proximation to the correct response. Re- 
sponses were scored one point for each con- 
sonant recalled, and two points for each 
consonant recalled in its correct position. 
The maximum score of eight was received 
for perfect recall. There were three recall 
time intervals: 2.8 seconds, 9.35 seconds, 
and 17 seconds. Subjects were 20 male stu- 
dents enrolled in introductory psychology. 


Results 


The mean response score for each incen- 
tive condition at the three time intervals is 
shown in Figure 7. An analysis of variance 
reveals a significant main effect due to the 
incentives, F(3, 57) = 8.70, p < .01. The 
Time X Incentive interaction does not ap- 
proach significance, F < 1. Paired-mean 
tests were employed to compare these re- 
sults with the earlier findings. A Newman- 
Keuls test reveals that there are no signifi- 
cant differences in recall at the short time 
interval. At the intermediate interval stim- 
uli associated with shock are retained sig- 
nificantly more than the control stimuli 
(p < .05). At the long interval stimuli as- 
sociated with shock are recalled signifi- 
cantly more than stimuli associated with a 
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one-cent reward (p < .05) and the control 
stimuli (p « .01). Stimuli paired with a 
five-cent reward also are recalled signifi- 
cantly more than the control stimuli (p < 
.05) at the long time interval. 


Discussion 


The general pattern of results replicates 
the findings of Weiner and Walker, save 
that the effectiveness of the nickel is not as 
great. The results of this preliminary inves- 
tigation are essential for the interpretation 
of the findings in Experiments XI-XV. 


ExPERIMENT XI 


This experiment directly attacks the is- 
sue concerning the absolute versus relative 
effects of motivation on retention. In the 
previous studies in this series a "differen- 
tial" experimental method has been used 
(Lawson, 1957). That is, each subject 
served as his own control and received mul- 
tiple stimulus presentations. This is a 
within-subjects experimental design. A sec- 
ond possible procedure is known as the “ab- 
solute" method (Lawson, 1957). In that 
procedure different subjects are used in dif- 
ferent experimental conditions; this is a 
between-subjeets design. 

The two different experimental ap- 
proaches have yielded disparate results in 
psychological research. A number of stud- 
les within the domain of motivation report 
an expected behavioral difference as a 
function of motivation when the differen- 
tial method is employed, but the findings 
have not been replicated when using an ab- 
solute method. Lawson (1957) found that 
"visual discrimination performance varies 
with incentive amount only when Ss have 
experience with different amounts in asso- 
ciation with different stimuli [p. 39]." Pu- 
bols (1960), summarizing the research on 
incentives, concluded that incentives affect 
learning when training is by the differential 
rather than by the absolute method. In two 
recent studies of human learning, Harley 
(1965a, 1965b) has confirmed Pubols’ con- 
clusions. Similarly, Grice and Hunter 
(1964), in an article appropriately entitled 
‘Stimulus intensity effects depend upon the 
type of experimental design,” found that 


“substantially ‘greater (signal intensity) 
effects are obtained if individual Ss are ex- 
posed to the different intensities than if 
each S experiences only one intensity value 
[p. 247]." Further, Wright (1965) was able 
to establish a learned hunger drive when a 
within-subjects experimental design was 
used, whereas numerous previous investiga- 
tors were not able to find this result using 
between subject comparisons. 

One method to determine the relative 
versus absolute effect of motivation on 
memory is to compare results of studies us- 
ing the two procedures outlined above. Grice 
and Hunter (1964) state that “this turns 
out to be a reasonable, but neglected, ex- 
perimental design in psychological research 
[p. 248]." If the recall of control stimuli in 
the absolute method exceeds the recall of 
control stimuli in the differential method, 
then the differential procedure results in a 
decrement in the recall of those stimuli. 
Motivation would then relatively but not 
absolutely enhance recall. On the other 
hand, if the recall of the control stimuli in 
the absolute condition equals the recall in 
the differential condition, then the recall of 
stimuli associated with incentives must 
have been absolutely enhanced in Experi- 
ments I-V. This analysis can be expressed 
somewhat differently to include the recall 
of stimuli associated with a motivational 
factor in an absolute procedure. If the dif- 
ferential method was effective because there 
was a decrement in the recall of the control 
stimuli, then in the absolute procedure 
there should be no difference between the 
recall of control and motivational stimuli. 
Conversely, if motivation absolutely en- 
hances retention, then in the between-sub- 
jects design the recall of stimuli associated 
with an incentive should exceed the recall 
of the absolute control stimuli. 

In Experiment XI the conditions neces- 
sary to test these comparisons are estab- 
lished. Some adjustments are made in the 
experimental procedure. The previous stud- 
ies consistently have shown no differences 
in recall between the penny and control 
condition. The smaller monetary reward is 
therefore not adding any information. In 
the remaining studies there are only three 
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experimental conditions: shock, five-cents 
reward, and a control condition. The stimuli 
in Study XI were the quadrigrams used in 
Experiment X. Also, feedback was con- 
veyed following correct recall in the five- 
cent condition. A pleasant bell was sounded 
to signal a monetary reward. 

Subjects were 57 male students enrolled 
in the introductory psychology course at 
the University of California, Los Angeles. 
Eighteen subjeets were tested using the 
eustomary differential method; 18 were in 
an absolute control group, and 21 were in 
the absolute shock incentive group. Estab- 
lishing the latter condition ereated à num- 
ber of difficulties. If there is a potential 
shock on every trial, then the total number 
of shocks received in the absolute and dif- 
ferential procedures would differ. This 
could result in some adaptation and a loss 
of the motivational effectiveness of the 
shock in the absolute method. It is possible 
to equate the potential number of shocks 
by giving only 24 trials in the absolute 
procedure, inasmuch as ¥% of the 72 trials 
are cued for shock in the differential 
method. However, this would confound the 
amount of practice and proactive inhibition 
between the two methods. An alternative 
procedure is to convey to the subject that 
on a randomly selected V5 of the trials he 
may receive shock if incorrect. This equates 
the number of potential shocks in the be- 
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Fig. 8. Percentage retention at three time in- 
tervals as a function of incentive, employing 
vera as stimuli and the differential proce- 

lure. 


tween-subjects and within-subjects meth- 
ods. This procedure was used in the be- 
tween-subjects design. The trials associated 
with shock were isomorphic with the poten- 
tial shock trials in the differential method. 

There is no absolute five-cents reward 
condition. The findings for the shock condi- 
tion are sufficient to test the hypotheses 
under consideration. 


Results 


In Figure 8 the percentage retention scale 
scores for the three conditions in the differ- 
ential procedure are plotted. The data repli- 
cate the finding that stimuli associated with 
a motivational outcome are retained signifi- 
cantly more than the control stimuli. In 
contrast to our previous studies, the nickel 
again is as potent a determinant of reten- 
tion as shock. An analysis of variance on 
this data indicates a significant main effect 
due to the incentive, F (2, 34) = 13.29, p < 
.01. The Incentive X Time interaction ap- 
proaches significance, F (4, 68) = 2.06, p < 
10. A Newman-Keuls test reveals that 
there are no significant differences in recall 
at the short time interval. At the medium 
and longer interval recall of the nickel and 
shock stimuli is significantly greater (p < 
.01) than the recall of the control stimuli. 

In the absolute or between-subjects com- 
parison respective recall at the short, me- 
dium and long intervals in the shock condi- 
tion is 89%, 75%, and 61%. In the control 
condition the respective recall is 87%, 72%, 
and 61%. There clearly is a minimal differ- 
ence in recall between the two conditions. 

Table 1 compares the total recall in the 
within-subjects and between-subjects proce- 
dures. The table reveals that the recalls of 
the control stimuli in the absolute method 
(hereafter referred to as the straight control 
stimuli) is greater than the recall of the con- 
trol stimuli in the differential procedure, and 
approximates the recall of the shock stim- 
uli in the differential procedure. 


Discussion 


The results seem to confirm the previous 
suspicion that the retention of stimuli as- 
sociated with motivational factors is not 
absolutely enhanced. Rather, there appears 
to be a decrement in the recall of stimuli 
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TABLE 1 


Summary or RESULTS, EXPERIMENTS XI-XV, WITH STUDIES GROUPED ACCORDING 
TO TYPE or STIMULI 


Procedure 
aper: Stimuli Subjects Onset of Cue Diferential Absolute 
N | COM shod*| N | CO, | ay | Shock® 
XI | Quadrigrams | Male Stimulus 18 | 63 72 |18| 74 |21| 75 
XV | Quadrigrams Male Interpolated Activity | 18 | 65 70 |18| 71 
XII | Trigrams Male Stimulus 18 | 80 89 |18| 78 |21| 93 
XIII | Trigrams Male Interpolated Activity | 18 | 81 88 |18| 77 
XIV | Trigrams Female | Interpolated Activity | 18| 74 8 |18| 77 


a Percentage retention in scale scores. 


coupled with a nonmotivational condition 
in the differential procedure. The results 
also substantiate the general conclusions of 
Pubols and Grice and Hunter that between- 
subjects designs are less likely to yield an 
expected motivational effect than a within- 
subjects design. Further discussion of the re- 
sults of this investigation and the remaining 
studies will be postponed until the entire 
series of investigations (XI-XV) is pre- 
sented. 


ExrermeENT XII 


The findings in Experiment XI were of 
sufficient import to warrant a replication. 
One change was made in the experimental 
procedure: trigrams rather than quadri- 
grams are the to-be-remembered stimuli. 
In retrospect, the exact reason for this 
modification is somewhat obscure. The best 
guess is that it was decided to reinstate 
some of the conditions used in the original 
Weiner and Walker study. Subjects were 
57 male students enrolled in introductory 
psychology. Eighteen subjects were tested 
with the differential method. There were 
18 subjects in the absolute control condi- 
tion, and 21 in the absolute shock condi- 
tion. 


Results 


Figure 9 portrays the results when the 
differential method was employed. As ex- 
pected, there is a significant main effect due 
to the incentive, F (2, 34) = 17.38, p < .01. 
The Time x Incentive interaction also is 


significant, F(4, 68) = 5.10, p < .01. A 
Newman-Keuls test shows that there are 
no significant differences in retention at the 
short interval. At the intermediate interval 
stimuli cued for shock are retained signifi- 
cantly more than the control stimuli, (p < 
.01). At the longer interval both shock 
stimuli and stimuli cued for five cents are 
reealled significantly more than the control 
stimuli, (p < .01). 

In the between-subjects design recall at 
the three time intervals in the shock condi- 
tion is 99%, 92%, and 87%. For the absolute 
control group recall at the three intervals is 
97%, 74%, and 65%. As in the differential 
method, recall of motivational stimuli is 
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greater than the recall of control stimuli 
(t = 432, df = 37, p < 01). 

Table 1 shows the total recall of shock and 
control stimuli in the within-subjects and 
between-subjects procedures for Experi- 
ment XII. It is clear from the table that the 
two procedures yield virtually identical re- 
sults. In both conditions there is a significant 
difference in recall between the motiva- 
tional and nonmotivational stimuli. 


Discussion 


The results certainly were surprising. 
The data indicate that there is an absolute 
enhancement of the retention of stimuli 
associated with a motivational factor. The 
results and inferences of Experiment XI 
therefore are not substantiated. 

The only purposive change between Ex- 
periments XI and XII was that Experiment 
XI employed four-letter consonants as 
stimuli, while in Experiment XII the stim- 
uli were three-letter consonants. 


Experiment XIII 


It was essential to attempt to replicate 
the apparently conflicting results of the 
prior two studies. In Experiment XIII tri- 
grams were again the to-be-remembered 
units (as in Experiment XII). Two changes 
were made in the experimental design. 
First, there was no absolute shock condi- 
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Fic. 10. Percentage retention at three time in- 
tervals as a function of incentive, employing 
trigrams as stimuli, the absolute and differential 
procedures, and cueing for motivation at the 
onset of the interpolated activity. 


tion. The significant comparisons in the 
prior two studies concerned the recall of 
eontrol stimuli in the absolute and differ- 
ential procedures. Hence only an absolute 
control condition is included in this ex- 
periment. Secondly, the motivational cue 
was presented at the onset of the interpo- 
lated activity, rather than at the onset of 
the stimulus. One long-term goal of this 
research program is to determine the place 
in the memory sequence at which the moti- 
vational input will maximally affect reten- 
tion, Earlier studies in this series have in- 
dicated that motivational input at the time 
of stimulus presentation does influence re- 
eall, but this is not true when the input 
occurs during the period of retrieval. In this 
study the motivational cue appears after 
the offset of the stimulus, during the period 
of trace storage. Further implications of 
this change will be discussed later in the 
paper. While it is indeed risky to vary a 
factor when attempting to replicate a find- 
ing, the author decided to take this risk 
because of the multitude of problems which 
needed exploration. If the experiment repli- 
cated the prior results, in spite of the in- 
duced change, then two findings would 
emerge from one experiment. 

Eighteen male students were tested with 
the differential procedure, and 18 were in 
the absolute control condition. 


Results 


Figure 10 illustrates the recall in the 
three conditions tested with the differential 
procedure and the recall of the straight 
control stimuli. Again with the differential 
procedure there is a significant main effect 
attributable to the incentive, F(2, 34) = 
11.77, p < .01, and a significant Time X 
Incentive interaction, F (4, 68) = 5.73, p < 
01. Recall of the control stimuli does not 
differ from the recall of the motivational 
stimuli at the short time interval. At the 
medium interval the shock and nickel stim- 
uli are recalled significantly more than the 
control stimuli, (p « .05). The differences 
in recall also are exhibited at the long 
interval, (p < .01). The recall of the abso- 
lute control stimuli is virtually identical to 
the recall of the control stimuli in the dif- 
ferential procedure. The respective total 
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recall in the differential and absolute proce- 
dures is given in Table 1. 


Discussion 


The results replicate the findings of Ex- 
periment XII. As Table 1 shows, the pat- 
tern of results in Experiments XII and 
XIII is almost identieal. The data con- 
sistently demonstrate that the retention of 
irigrams associated with a potential shock 
is absolutely enhaneed. 


Experiment XIV 


To be absolutely certain about the relia- 
bility of the findings in Experiments XII 
and XIII, the experiment was conducted 
again. In Experiment XIV there is one pro- 
cedural change: subjects are females. In 
all the prior studies the subjects were males. 
The use of males in the first study by 
Weiner and Walker was entirely chance; 
males happened to be available in the sub- 
ject pool. After the initial finding the ex- 
perimenter was somewhat reluctant to 
include female subjects because sex differ- 
ences pervade so many problem areas in 
psychology. An unreported pilot study con- 
ducted earlier in this series did reveal that 
females were behaving differently than 
males in situations employing “right” and 
“wrong” feedback, but were replicating the 
male results when only shock feedback was 
used. It was therefore decided not to use 
females in the ensuing experiments. The 
major impetus for Experiment XIV was 
the alarming availability of female sub- 
jects in introductory psychology and a 
scarcity of male subjects. The experimental 
procedure was identical with that in Ex- 
periment XIII. 


Results 


The pattern of recall at all time intervals 
is virtually identical with the recall in the 
previous study. In the differential method 
there is a main effect attributable to the in- 
centive conditions, F(2, 34) = 11.50, p < 
01. The Time x Incentive interaction does 
not reach significance, F(4, 68) = 1.70, 
D < .25. There is the familiar difference in 
recall at the medium and long interval be- 
tween the shock and nickel versus control 
stimuli (p < .01). The amount of recall in 


the straight control condition is very simi- 
lar to recall of the control stimuli in the 
differential procedure. Table 1 gives the 
total percentage recall in the absolute and 
differential procedures. 


Discussion 

Experiments XII and XIII were repli- 
eated. The total recall for females was 
slightly lower than that of males, but the 


general results are strikingly consistent 
(see Table 1). 


ExrrgiwENT XV 


Experiments XII-XIV are convincing; 
with trigrams as stimuli there is an abso- 
lute facilitation in retention in the motiva- 
tional conditions. Experiment XV reverts 
to the use of quadrigrams as stimuli. The 
procedure is identieal with that used in 
Experiment XIII, which used male sub- 
jects. 


Results 


Figure 11 gives the retention for subjects 
in both experimental methods. In the differ- 
ential procedure there is the significant 
main effect attributed to the incentive, 
F(2, 34) = 3.54, p < .05. The Time x In- 
centive interaction approaches significance, 
F(4, 68) = 2.33, p < .10. There is no sig- 
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nificant difference in recall between the mo- 
tivational and control stimuli at the short 
time interval. At the medium interval there 
is a significant difference between the recall 
of the nickel and control stimuli, p < .01. 
At the long interval both the shock and 
nickel stimuli are recalled significantly 
more often than the control stimuli, p < 
.05. The amount of recall in the absolute 
condition again exceeds the recall of the 
control stimuli in the differential procedure 
(see Table 1). 


Discussion 

The pattern of results replicates Experi- 
ment XI. With quadrigrams as stimuli 
there seems to be a relative decrement in 
the recall of control stimuli, rather than an 
enhancement in the retention of stimuli as- 
sociated with motivational factors. 


RECONSIDERATION OF EXPERIMENTS 
XI anv XII 


The results in Experiments XI and XII 
have been replieated in three studies, and 
at this time must be considered reliable. 
Table 1 presents the total recall of shock 
and control stimuli in Experiments XI-XV. 
Using quadrigrams as stimuli, the table in- 
dieates that the differences in recall in the 
differential method appear to be caused by 
a relative decrement in the retention of the 
control stimuli. Employing trigrams as 
stimuli, Table 1 reveals that the difference 
in recall in the differential method seems to 
be attributable to an absolute increment in 
the retention of stimuli associated with 
shock. 

Before considering some general impli- 
cations of these findings, let us briefly re- 
turn to the data comparison which led to 
the questioning of the relative facilitation 
in the retention of the to-be-shocked stim- 
uli (see Figure 6). To recapitulate briefly, 
Figure 6 seems to indicate that there is a 
decrement in the recall of control stimuli 
cued for incentives during the period of 
stimulus onset. This was inferred from the 
data in Experiments VI-VIII, which found 
no difference in the retention of stimuli 
cued during the period of retrieval. It was 
then asked whether there was a decrement 


in the recall of the control stimuli, or 
whether cueing at retrieval rendered all the 
stimuli motivationally relevant. Experi- 
ments XI-XV provide evidence which 
favors the latter alternative. In the studies 
employing trigrams as stimuli (Experi- 
ments XII-XIV), there is an absolute in- 
crement in the recall of the potential shock 
stimuli. Experiments I-VIII employed tri- 
grams as stimuli. Hence in those experi- 
ments there also must have been a facilita- 
tion in the recall of the shock stimuli. The 
equality in the recall of both shock and con- 
trol stimuli when cueing during retrieval, 
and the equivalence of that recall with the 
recall of the to-be-shocked stimuli cued dur- 
ing stimulus onset, indicate that when cue- 
ing during retrieval the subject reacts to all 
the stimuli with heightened motivation. 
There are other, undoubtedly more sig- 
nifieant, implications of the results in Ex- 
periments XI-XV. The differential proce- 
dure yielded the same findings when either 
trigrams or quadrigrams were the stimuli. 
However, the addition of the absolute pro- 
cedure demonstrates a difference in the re- 
tention funetion between the two types of 
stimuli. The between-subjects design led to 
the differentiation of the results obtained 
with the within-subjects design. Stated 
somewhat differently, Experiment XI 
(quadrigrams as stimuli) yielded different 
results with different experimental proce- 
dures, while Experiment XII (trigrams as 
stimuli) yielded the same results across the 
different procedures. Thus, there is an in- 
teraction between the type of experimental 
design and the type of stimulus material. 
This interaction indicates that different ex- 
perimental interpretations are needed when 
trigrams and quadrigrams are the to-be-re- 
membered materials. More generally, the 
interaction suggests that research in psy- 
chology might profit from an experimental 
design similar to the multitrait, multi- 
method approach advocated by Campbell 
and Fiske (1959) in correlational research. 
Multimethod comparisons perhaps are à 
necessary part of psychological research. 
Why is it that when trigrams are the to- 
be-remembered stimuli motivation abso- 
lutely enhances retention, while this does 
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not appear to be the case when quadrigrams 
are the stimuli? At this time the author can 
only vaguely speculate about some of the 
dimensions which distinguish the stimuli. 
First, there is absolutely greater retention 
of trigrams than quadrigrams. It was previ- 
ously suggested that the effects of motiva- 
tion on retention are in part of a function 
of the total amount of forgetting. The re- 
sults of Experiments XI and XII do not 
support the hypothesis that motivational 
factors are most effective when forgetting 
is maximal, However, the general relation- 
ship between motivation and total forget- 
ting might be a determinant of the strange 
pattern of results exhibited in Experiments 
XI-XV. A second difference between tri- 
grams and quadrigrams is their degree of 
meaningfulness. Trigrams are likely to be 
more meaningful than quadrigrams, al- 
though specific associative norms on quad- 
rigrams remain to be collected. Other re- 
search in the area of motivation and 
memory has indicated that meaningfulness 
is an important determinant of retention. 
White, Fox, and Harris (1940) found that 
during a hypnotic trance the recall of 
meaningful material was facilitated. This 
was not true for the recall of nonsense syl- 
lables. Rosenthal (1944), in a more com- 
plex study, also demonstrated that the re- 
call of meaningful material is enhanced if 
retrieval is during a hypnotic state. Mean- 
ingfulness may be an important dimension 
in the present research, and conceivably 
could be a significant factor differentiating 
trigrams from quadrigrams. Still a third 
difference between the stimuli may be that 
trigrams are more likely to be retained 
with the aid of special associative or mne- 
Monic devices. For example, one subject 
reported that the stimulus xcP was easy to 
remember because he formerly was an as- 
piring general practioner. Such strategies 
perhaps are less possible with quadrigrams; 
this is an unmeasured dimension in the 
present research. 

It is also of interest to note the resur- 
gence of the potency of the five-cent reward. 
It is regretful that the nature of the re- 
search program makes some conclusions 
very tentative. It may be that the nickel 
Was effective because the pleasant signal 


accompanied the correct response. How- 
ever, this feedback was used in a new popu- 
lation, and the nature of the subject popula- 
tion may have been the significant variable, 

In summary, the variables which effect 
the motivation and memory linkage tenta- 
tively include: the magnitude of incentive, 
the type of incentive, the nature of the 
experiences intervening between stimulus 
onset and recall, the point in the memory 
sequence at which the motivation is intro- 
duced, the nature of the feedback, the 
stimulus material, the type of experimental 
design, and complex interactions between 
these variables. 


GENERAL IssUES 


Rehearsal 


In the earlier paper of Weiner and 
Walker the following statement was made 
concerning rehearsal: 


It might be argued that the interaction between 
the time interval and the incentive conditions 
was mediated by differential rehearsal of the 
stimuli. It is conceivable that the greater the in- 
centive value of the stimulus, the greater is the 
tendency of the subjects to repeat that stimulus, 
The differential rehearsal hypothesis is especially 
provocative because it is generally accepted that 
learning increases as a function of the number of 
repetitions of the to-be-learned material. If moti- 
vational manipulations result in differential re- 
hearsal, then the degree of learning becomes 
confounded with the storage process, and demon- 
strating that motivation influences retention cer- 
tainly would be a formidable problem. In this 
experiment subjects were paced during the inter- 
polated task to minimize the amount of rehearsal; 
there is no evidence that subjects do or do not 
covertly rehearse one set of stimuli more than 
another set. [p. 192] 


There are now a number of strong argu- 
ments against the differential rehearsal ex- 
planation of the results. First, the speed 
and difficulty of the interpolated task lim- 
its the feasibility of this explanation. In 
addition, rehearsal would be most likely to 
occur during the .70-second interslide in- 
terval between the offset of the stimulus 
and the onset of the interpolated activity. 
Therefore, cueing at the onset of the in- 
terpolated activity (Experiments XIII- 
XV) should lessen the differences in recall 
between the motivational and nonmotiva- 
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tional conditions as compared with studies 
which cue at the onset of the stimulus. The 
data, however, do not support this supposi- 
tion. Further, it was found that a more 
difficult interpolated activity increased the 
differences in recall between the motiva- 
tional and control stimuli. This would not 
be expected if differential rehearsal caused 
the differences in retention. Rehearsal 
should be less likely to occur as the diffi- 
culty of the interpolated task increases. 
Finally, the differential rehearsal explana- 
tion would not shed any light upon the 
complex interaction found between the ab- 
solute and differential procedures and the 
stimulus material, 


Repression 


Previous research (e.g., Clemes, 1964; 
Russell, 1952; Zeller, 1952) has suggested 
that events associated with unpleasant af- 
fective states tend to be “repressed”; that 
is, the traces of those events are unavail- 
able for immediate retrieval. Many of the 
experimental studies of repression are meth- 
odologically inadequate (cf. Weiner, 1966). 
However, there are some conclusive experi- 
mental demonstrations of the phenomenon 
(e.g., Clemes, 1964), and a great wealth of 
clinical observations (Freud, 1946) sup- 
porting the concept of repression. Con- 
versely, the studies reported in this mono- 
graph demonstrate that the retention of 
material associated with unpleasant events 
is enhanced, rather than dampened. 

A critical difference between the investi- 
gations presented here and previous re- 
search and observations concerning re- 
pressed material is that in the present 
procedure retention is instrumental to the 
avoidance of a potential shock. Repression 
is conceptualized as an ego function (Freud, 
1936) and is regulated in accordance with 
the pleasure-pain principle. Within the 
framework of analytic theory it therefore is 
quite conceivable that the retention of 
events associated with unpleasant outcomes 
will be facilitated, given proper circum- 
stances. In other discussions of repression, 
“forgetting” rather than retention is the 
more adaptive psychological process; in 
the present paradigm the reverse is true. 
This analysis is similar to Dulany’s (1957) 


view that both perceptual vigilance and 
perceptual defense can be exhibited, de- 
pending on the nature of the response in- 
strumentalities. 


Action Decrement 


Walker (1958) has presented a general 
theory of learning and retention which in- 
terrelates concepts of arousal, action decre- 
ment, and consolidation. The major pre- 
diction of his theory is that high arousal 
during learning makes the trace of the ma- 
terial less available for immediate recall, 
but results in greater permanent memory. 
These predictions have been substantiated 
(Kleinsmith & Kaplan, 1963; Walker & 
Tarte 1963). 

In the present studies, Walker’s predic- 
tions are not confirmed. Stimuli a priori 
considered to be highly arousing because of 
their association with an affective conse- 
quence are more likely to be recalled after 
a relatively short time interval than stimuli 
considered to be relatively low in arousal 
value. At this time the author cannot rec- 
oncile the results supporting Walker's the- 
ory with the data from the present series 
of studies. The experiments are quite dif- 
ferent, and methodological comparisons are 
not possible. However, the instrumentalities 
of retention in the present studies again 
may be a crucial difference. Walker con- 
siders the decrement in the availability of 
highly arousing stimuli to be an adaptive 
process which ultimately strengthens the 
stimulus trace. In the present investigations 
clearly the more adaptive procedure is to 
have the material immediately available 
for recall. 


Summary 


Fifteen studies were presented which in- 
vestigate the effects of motivation on mem- 
ory. In a variation of the Peterson and 
Peterson technique devised to study short- 
term retention, stimuli (trigrams) were 
cued for various incentives. For some stim- 
uli recall was rewarded with money, while 
for other stimuli nonrecall was punished 
with a shock. In Experiments I-IV the 
magnitude of the aversive shock and mone- 
tary reward was varied. Recall was en- 
hanced in the motivational conditions when 


MOTIVATION AND Memory 21 


compared to a control condition, although 
the magnitude of the potential shock was 
not related to recall. In these studies there 
were no differences in recall at a short 
time interval, and recall at that point was 
approximately 80%. Further, there were 
differential decay rates for identical stimuli. 
Therefore, differences in recall were attrib- 
uted to differences in retention (storage) 
rather than to differences in the degree of 
original learning. Experiment V demon- 
strated that the differences in recall be- 
tween conditions is maximized as the task 
interpolated between stimulus onset and 
recall becomes more difficult. 

In Studies I-V the onset of the stimulus 
and the motivational cue were presented 
simultaneously. In Experiments VI-IX the 
cue was presented immediately prior to or 
during the period of trace retrieval. There 
were no differences in recall between the 
conditions in these experiments. 

Experiments X-XV examine whether the 
differences in retention exhibited in Ex- 
periments I-V are to be attributed to a 
decrement in the retention of the control 


stimuli, or to an inerement in the retention 
of the motivational stimuli. To investigate 
this problem it is necessary to employ both 
between-Ss and within-Ss experimental de- 
signs. The stimuli were trigrams or quadri- 
grams (four-letter consonants). The find- 
ings appear to indicate that with trigrams 
as stimuli there is an absolute enhance- 
ment of the recall of stimuli associated with 
a motivational state. However, with quadri- 
grams as stimuli the differences in retention 
seem to be caused by a decrement in the 
recall of the control stimuli. These findings 
were consistent in five experimental inves- 
tigations. The results were replicated when 
the stimuli were cued either during stimu- 
lus onset or during the onset of the interpo- 
lated activity. Possible reasons for the un- 
expected pattern of results were discussed. 

The article concludes with a discussion 
of the relevance of this work to repression 
and action decrement. It also was con- 
cluded that differences in recall between 
motivational and nonmotivational condi- 
tions were not caused by differential re- 
hearsal of the stimuli. 
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INTROSPECTIONIST AND BEHAVIORIST INTERPRETATIONS 
OF RATIO SCALES OF PERCEPTUAL MAGNITUDES? 


C. WADE SAVAGE? 
University of California, Los Angeles 


(a) The psychological magnitudes involved in perception are observable 
but private according to the introspectionist, public and nonobservable 
according to the behaviorist. (b) In their constructions of psychophysical 
scales, both the introspectionist and the behaviorist rely on a principle of 
correspondence between psychological magnitude and O’s estimates of 
physical magnitude. The former bases this principle on hypotheses concern- 
ing O’s perceptual mechanism; the latter regards the principle as a stipula- 
tive definition. (¢) Psychophysical laws obtained by ratio and other scaling 
procedures are explanations of O's behavior on the introspectionist view, 
descriptions of O’s behavior on the behaviorist view. In the past the intro- 
spectionist and the behaviorist interpretations have been run together, 
thus making it possible to amalgamate the advantages and obscure the 
deficiencies in both. These deficiencies suggest that the concept of psycho- 
logical magnitude ought to be abandoned, and that perceptual psycho- 
physical sealing procedures should be regarded as procedures for measuring 
perceptual abilities. This suggestion violates both the old psychophysics 
of Fechner and the new psychophysics of Stevens. But it dissolves the 
traditional question of whether psychophysical measurement is possible, as 
well as the contemporary question of whether the validity of competing 
psychophysical scales can be determined. The above analysis is illustrated 
with a hypothetical fractionation experiment in which a ratio scale of psy- 


chological length is constructed. 


Psoonses is generally regarded as 
the attempt to measure or scale psycho- 
logical magnitudes. Very roughly we may 
distinguish jnd (confusion, discriminability) 
scales, partition (category, equisection) 
Scales, and ratio (magnitude) scales of psy- 
chological magnitudes. The “new” psycho- 
physics, for which S. S. Stevens is largely 
responsible, holds that the ratio scale is most 
desirable, for the following reasons. It is 
Superior to a jnd scale, since it is constructed 
by a “direct” method (Stevens, 1958a, p. 
387), and since jnd’s are not “subjectively 
equal” on so-called prothetie continua 
oie abinde 


! After the acceptance of this Monograph, I sent 
the manuscript to S. S. Stevens who has prepared 
the appended reaction (pp. 33-38). Considera- 
tions of time precluded my sending Stevens' re- 
Sponse to Savage for further comment. Editor. 

* This research was brought to completion with 
the bibliographical assistance of James R. Shaw, 
Whose services were provided by a University of 
California research grant. 


(Stevens & Davis, 1936, pp. 411-416, and 
Stevens, 1954, pp. 30-31; 1957, pp. 154, 172; 
1960b, p. 227; 1960c, pp. 57-59; Hirsh, 1952, 
pp. 10-11). It is superior to a partition 
scale, since it enables us to say not only that 
one psychological entity is greater than an- 
other, but also how much greater (Stevens & 
Davis, 1936, pp. 406-407, and Stevens, 
1959b, p. 611; 1960, p. 28; 1960b, pp. 228- 
230); and also because the ratio scale con- 
tains the partition scale (1960c, pp. 53-54). 

Furthermore,  ratio-scaling procedures 
have led to the discovery of a psychophysical 
law of great generality and theoretical power 
(Stevens, 1957, p. 162; 1958b, pp. 192-194; 
19602, pp. 28-29; 1960b, pp. 234-235; 1961, 
p. 84; 1962, pp. 30-32). Stated as a first 
approximation the law is 


(1) Y = kð, 


where @ is the stimulus magnitude in physi- 
cal units, Y is the psychological magnitude 
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in psychological units, k is a constant deter- 
mined by the choice of units, and n varies 
according to the sense modality in question. 
(1) is a power law and contrasts with Fech- 
ner's logarithmie law which states that 


(2) Y = kn log 4, 


where Y is the psychological magnitude 
measured in jnd units. The “new” psycho- 
physics claims that (2) is invalid, because the 
method for obtaining it is “indirect” and 
employs jnd’s as psychological units. (1), on 
the other hand, is said to be based on *'di- 
rect? methods and consequently to employ 
psychological units which accurately repre- 
sent psychological magnitudes. (For a 
review of recent developments in psycho- 
physical sealing see Ekman & Sjóberg, 1965.) 

Modern psychophysicists usually main- 
tain or imply that they have cast off the 
dualist metaphysics and the introspectionist 
methodology which frustrated their prede- 
cessors. Thus Galanter (1962, pp. 92-93) 
says: 


The name psychophysics derives from the clas- 
sical question about the relation between the 
physical environment and the mind. Today, 
modern psychophysicists are not professionally 
concerned with this philosophical issue of the 
mind-body relation, but rather with the con- 
straints that are placed upon the behavior of a 
person in his judgments, actions, and so on, by the 
sea of physical energies that surround him. 


And in Hirsh (1952, pp. 15-16) we read: 


The influence of behaviorism in American psy- 
chology is easily seen in modern psychophysics. 
We no longer look for relations between stimuli 
and sensations but rather relations between 
stimuli and responses. We can observe responses, 
the elements of behavior, and measure them, 
whereas the private sensation, which remains as 
untouchable as it was in Fechner’s day, does not 
concern us. We do-not ask whether or not a man 
hears a tone. We seek only to find whether or not 
he responds in a specified way to a tone. We can 
have measurement, then, on both sides of the psy- 
chophysical relation. The “psycho” part refers 
merely to behavior. 


(We will not discuss the interesting historical 
question of whether Fechner was trying to 
measure private, introspectional sensation, 
as is commonly supposed. The reader is re- 
ferred to Boring [1928] and Johnson [1929] 
who take opposite sides on this question.) 


The major claim of this study is that such 
declarations of philosophical enlightenment 
are premature. Many of the “new” psycho- 
physicists officially subscribe to a behavior- 
ist and operationist philosophy, but in their 
experimental and theoretical work employ 
introspectionistic assumptions and lapse into 
an introspectionistie view of the nature of 
psychophysieal measurement. They assure 
us that theirs is an attempt to measure be- 
havioral responses, and yet they employ 
methods whose rationale seems to be that 
they enable the experimenter to quantify 
those private sensations which were formerly 
regarded as directly inaccessible. This indict- 
ment could be completely substantiated only 
by considering a large number of scaling 
methods and the work of a large number of 
psychophysicists. In this study we will con- 
centrate on ratio-scaling methods and on the 
work of S. S. Stevens, who is the principal 
architect of the “new” psychophysics. 

The principal feature of the analysis in 
this paper is a distinction between introspec- 
tionist and behaviorist interpretations of 
perceptual magnitudes, the psychological 
magnitudes involved in perception. The two 
interpretations are sketched in an early sec- 
tion of the paper and then presented in detail 
and criticized in succeeding sections. In the 
final section we illustrate and discuss the 
unfortunate consequences of the almost 
universal failure to distinguish the two inter- 
pretations. Our analysis may indicate that 
the concept of psychological magnitude is 
illegitimate, and that the attempt to scale 
such magnitudes ought to be abandoned. 
This would be to abandon not only the orien- 
tation of Stevens’ “new” psychophysics, but 
also that of the “old,” which Fechner 
founded. Even if this more sweeping conclu- 
sion cannot be sustained, our analysis at 
least shows that neither ratio, partition, nor 
jnd scales can be properly assessed and com- 
pared without distinguishing the introspec- 
tionist from the behaviorist view of psycho- 
physical measurement. 


A SAMPLE RaTIO-SCALING EXPERIMENT 


Ratio scales can be constructed either by 
numerical estimation methods—magnitude 
production and magnitude estimation, or by 
fractionation methods—ratio production and 
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ratio estimation. (For an exhaustive classifi- 
cation and description of scaling methods 
see Stevens, 1958b.) The analysis of this 
study will be based on a sample experiment 
in which a scale for psychological length is 
constructed by the method of ratio estima- 
tion. Our major results apply equally, how- 
ever, to ratio scales constructed by any of the 
other appropriate methods. The experiment 
presented here is imaginary and stylized. 
Nonetheless, its main features have been ex- 
tracted from actual experiments, if not for 
length then for similar continua. And our 
analysis can, with only trivial adjustments, 
be applied to any actual ratio scaling experi- 
ment. We will call our sample experiment 
“M,” for “mak scale.” 

It consists of two parts. In each trial of the 
first part the experimenter (Z) presents the 
person experimented on (0) with a standard 
rod, $, , and asks him to select a comparison 
rod, $,, which looks one-fourth as long. In 
each trial of the second part O is asked to 
select comparison rods which look one-half 
as long as the standards. The data thus ob- 
tained are represented by the solid lines in 
Figure 1. In both fractionations O consist- 
ently overestimates the comparison rod (or— 
should we say?—underestimates the stand- 
ard). The equation for the lower line is 


(3) $, = .33,, 
for the upper line, 
(4) $, = 58 ,. 


Notice that the numerals along each axis of 
Figure 1 represent physical rather than psy- 
chological magnitudes. To measure the 
psychological magnitude involved in O's 
perception, a unit of psychological length 
must be chosen, and the data on which 
Figure 1 is based restated in terms of that 
unit. 

The choice of a unit of measurement is 
made primarily on the basis of convenience. 
Since it is convenient, let us stipulate that 
the psychological length associated with a 
Stimulus rod of 100 em. is 100 psychological 
units, and let us call the unit thus defined the 
mak. (For the origin of both the unit and its 
name see Reese, Reese, Volkmann & Corbin, 
1953, p. 41.) This choice of unit determines 
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Fie. 1. A plot of hypothetical data for two 
length fractionation experiments. 'The upper solid 
line represents O's visual estimates of one-half, 
the lower solid line O's visual estimates of one- 
fourth. The dash lines are derived by procedures 
and for purposes which are explained in a later 
section. 


the first point in a plot of psychological 
length against physical length, the point 
marked “A” in Figure 2. Referring back to 
Figure 1, we find that O would estimate a 
58-em. rod to be half as long as a 100-cm. 
rod. Hence, the psychological magnitude 
associated with the former must be half that 
associated with the latter. Since the former 
was 100 maks, the latter must be 50 maks. 
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PHYSICAL LENGTH IN CENTIMETERS 


Fig. 2. Three psychological curves for length 
estimates. The lower, ratio curve is obtained from 
the solid lines of Fig. 1. Points A-E are obtained 
from the upper solid line of Fig. 1, points a-c from 
the lower solid line in Fig. 1. The middle curve 
represents the data of an imaginary category 
scaling experiment, the upper curve the data of 
an imaginary jnd experiment. The relations of 
these two curves to the ratio curve are discussed 
in later sections. 
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Thus we obtain the point labeled “B.” Re- 
ferring again to Figure 1, we see that O would 
estimate a 33.6-cm. rod to be half as long as 
a 58-cm. rod. Hence, the psychological mag- 
nitude associated with the former must be 
25 maks, half that associated with the for- 
mer. This gives us Point C. D, E, and any 
further points desired are obtained by the 
same method, and a curve is fitted visually 
to these points. 
The equation for this curve is 


(5) y = din, 


where Y is the psychological magnitude ex- 
pressed in maks and @ is the physical magni- 
tude expressed in centimeters. A Y- curve 
can also be constructed in a similar manner 
from Line (3), by assuming that when O says 
one rod looks one-fourth as long as a second 
the psychological magnitudes associated 
with the two rods stand in the ratio 1:4. We 
thus obtain Points a, b, and c, which pre- 
cisely correspond to Points A, C, and E of 
the previous construction. The two psycho- 
physieal curves are thus found to coincide, 
which shows, presumably, that O's one-half 
and one-fourth estimates of length are made 
on the same psychological. continuum. 

"The other two curves in Figure 2 are con- 
cave downward. The lower of these is con- 
structed from data obtained in an imaginary 
eategory sealing experiment, in which O is 
asked to assign numerals from “1” to “11” to 
rod lengths. The O is asked to assign the 
numerals in such a way that the difference 
between a rod to which “1” is assigned and 
a rod to which “2” is assigned is the same as 
the difference between a rod to which “2” is 
assigned and a rod to which “3” is assigned, 
and so on. The category scale is represented 
on the inside left-hand ordinate. The upper 
curve in Figure 2 is constructed from data 
obtained in an imaginary jnd experiment. 
Number of jnd's is represented on the right- 
hand ordinate. Although all three curves are 
based on imaginary data, they are neverthe- 
less typical of the results of actual experi- 
ments dealing with length, area, finger span, 
and so on (see Reese, Reese, Volkmann, & 
Corbin, 1953; Stevens & Galanter, 1957; 
Stevens & Stone, 1959). 


Two INTERPRETATIONS 


(a) It is obvious that any interpretation 
of Experiment M must contain a distinction 
between. psychological (subjective, apparent) 
magnitudes and physical (objective, real) 
magnitudes (see Guilford, 1954, p. 21). The 
physical entities $; , 2, $5 , etc.—of which 
physical length is a magnitude—are asso- 
ciated with psychological entities, V , V2, 
Y^; , ete—of which psychological length is a 
magnitude, and which are involved in O's 
perception of physical entities (identity of 
subscript indicates association of physical 
entity with psychological entity). 

Magnitudes may be classified as intensive 
or extensive (Bergmann & Spence, 1944, pp. 
5-8; Savage, 1963, Ch. 3; Stevens & Volk- 
mann, 1940, pp. 330-331). If psychological 
length is an intensive magnitude, then any 
Y can meaningfully be said to be greater 
than, equal to, or less than another. Conse- 
quently, we can ask whether psychological 
length varies with physical length and, if so, 
whether inversely or directly. If psychologi- 
cal length is an extensive magnitude, then 
any given Y can meaningfully be said to be 
half as great, twice as great, 10 times as 
great, etc. as another. Consequently, we may 
sensibly attempt to measure psychological 
length in the fullest sense, that is, to erect a 
ratio scale for this magnitude, and to deter- 
mine precisely how, in terms of its scale and 
the centimeter scale, psychological length 
varies with physical length. Experiment M 
attempts to do this, and psychophysical law 
(5) is the result. 

(8) Any interpretation of Experiment M 
must commit the procedure therein em- 
ployed to a principle of correspondence be- 
tween relative psychological length and O's 
estimates of relative physical length. In the 
experiment (a) E measures the physical 
length of rods to be presented to O, (b) O es- 
timates the relative physical length of these 
rods, (c) E represents the data thus obtained 
in Figure 1, and (d) E scales psychological 
length by constructing Figure 2 from Figure 
1. (a), (b), and (c) are straightforward 
enough, but (d) is puzzling. How, from the 
data of physical measurement and physical 
estimates, is a psychological magnitude 
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scaled? Such scaling will appear to be a 
conjuring trick unless certain of its assump- 
tions are brought to light. Chief among these 
is what we have called the principle of cor- 
respondence, the general statement of which 
is: If O estimates that ; , $» , $; , etc. stand 
in the physical relation R, then V, , Ws, v, j 
etc. stand in the psychological relation R. 
The instance of this principle which is used 
in constructing a jnd scale is (A): If O esti- 
mates that , and 4; are just noticeably 
different and that 2 and ®; are jnd, then the 
interval between Y, and Wz is equal to that 
between Y; and Y. The instance used in 
constructing a partition scale is (B): If O 
estimates that the interval between 4; and 
9; equals that between $; and $;, then the 
interval between Y, and V equals that be- 
tween Y» and Y; . The instance used in con- 
structing a ratio scale is (C): If O estimates 
that 4$; and 4; stand in the same ratio as 2 
and 4;, then Y, and Y stand in the same 
ratio as Y» and Y. The more specific in- 
stance used in constructing the ratio scale in 
Experiment M is (Cl): If O estimates that ©, 
and stand in the ratio m:n, then Y, and 
Y» stand in the ratio m:n. Notice that the 
method of numerical estimation also employs 
a principle of correspondence, one that is 
intimately related to (C). The principle is 
(D): If O assigns to 4,, d», Sz, etc. the 
numerals m, n, o, ete. respectively, then 
Vi = k(m/n) Ve, Ws = k(n/o) Ws, etc., 
where either k = 1 ork = 1. If we make the 
Standard assumption that k = 1, then the 
principle becomes (D1): If O assigns to 1, 
%, ;, etc. the numericals m, n, 0, etc. 
respectively, then Yı , Vs , V3, etc. stand in 
the ratios m:n:0: etc. (For examples of the 
explicit use of (Cl) see Harper & Stevens, 
1948, p. 345, and Reese, 1943, pp. 22-23). 
(yl) The introspectionist interpretation 
of Experiment M holds that Ws and their 
magnitudes are privately observable and. that 
the principle of correspondence embodies a 
theory of O’s perceptual mechanism. This in- 
terpretation preserves a simple, attractive, 
and familiar view of the nature of psycho- 
physics. Psychological entities are held to 
Occupy a private realm distinct from the 
public realm occupied by physical entities. 


Nonetheless, psychological magnitudes are 
as “empirically real” as and no less funda- 
mental than physical magnitudes. Conse- 
quently, it is a legitimate scientific enterprise 
to try to measure Y magnitudes, much as we 
measure $ magnitudes (e.g., with a meter 
stick), and to attempt to discover the mathe- 
matical relation between Y magnitudes and 
9$ magnitudes, just as we attempt to discover 
the mathematical relation between two & 
magnitudes. Psychophysics is thus “an exact 
theory of the functionally dependent rela- 
tions of body and soul” (Fechner, 1966), the 
attempt to extend the best procedures of 
physical science into the psychological realm. 
Another advantage is that the introspec- 
tionist interpretation leaves no doubt that 
M is a psychological experiment. How do the 
physical measurements and physical esti- 
mates obtained in M become transformed 
into a scale of psychological length? The in- 
trospectionist answers that O perceives psy- 
chological entities by some inner sense and 
uses these inner estimates to make outer 
estimates. Although O reports only. the 
latter, E can make use of the former by 
hypothesizing a certain perceptual mecha- 
nism in O, 

This hypothesis, however, produces one of 
the disadvantages in the introspectionist 
interpretation. The introspectionist assumes 
that O visually estimates the length of rods, 
$s, by introspecting the magnitude of psy- 
chological entities, Vs. But it is doubtful 
that any perceptual mechanism of this sort 
is at work in O when he estimates rod length. 
Putting the difficulty differently, principle of 
correspondence (Cl) is based on premises 
describing O's perceptual mechanism which 
are entirely problematic. These premises will 
be stated and thoroughly examined in the 
sequel. A related disadvantage is that the 
introspectionist interpretation does not seem 
to fit the phenomenological facts of Experi- 
ment M. O observes rods and estimates their 
relative length. But there is no phenomolog- 
ical evidence that he observes private, psy- 
chological entities and estimates their mag- 
nitude. A further disadvantage is that Ws 
are held to be private, hidden from the view 
of everyone but O. E must therefore rely on 


6 C. Wane SavaGE 


Os unconfirmed ¥ estimates in constructing 
a ratio scale of psychological magnitude. 
This seems to place psychological magni- 
tudes outside the pale of “objective” scien- 
tific investigation and measurement. 

(y2) The behaviorist interpretation holds 
that Ws and their magnitudes are nonobserv- 
able and that the principle of correspondence 
is true by definition. This interpretation is 
consistent with the phenomenological fact 
that Experiment M seems to involve obser- 
vations of physical entities only. A further 
advantage is that psychological entities are 
no longer located in some private realm, 
directly accessible only to O. Now they can 
be regarded as generally available to scien- 
tists, like any other magnitude which is 
capable of metricization and scientific treat- 
ment. But these advantages are purchased 
at what appears to be a price. 

For the distinction between psychological 
and physical magnitudes loses its sharpness 
and obviousness. It is now seen as unwise to 
think of Experiment M as an attempt to 
“discover the relation between mind and 
body,” since the dualism of parallel realms 
implicit in such a characterization is being 
brought to question. The dualist view im- 
plies that psychological entities are, although 
private, as “empirically real" as physical 
entities. But the behaviorist holds that they 
are theoretical constructs or "fictions," con- 
ceptual inventions of the experimenter de- 
vised for some anticipated scientific use. 
The principle of correspondence, (Cl), is the 
tool for building these constructs. It is a 
stipulative definition, not a description of 
some mechanism underlying O's perception 
of physical length. And the psychophysical 
law, (5), made possible by these conceptual 
maneuvers must be regarded, not as an 
explanation, but as a description of O's length 
perceptions. Vs thus become a useful but 
dispensable facon de parler. 


Tue ĪNTROSPECTIONIST INTERPRETATION 


The Nature of Psychological Entities 


One difficulty for the introspectionist in- 
terpretation arises from its assertion that O 
observes private entities of which psycholog- 
ical length is a magnitude. The O sees rods, 
walls, and the experimenter; he feels his 


chair and other physical objects; he hears 
the sound of E's voice; and so on. This de- 
seription of what O observes does not men- 
tion any private, psychological entities. If O 
does observe such entities, how does he do 
it: by seeing them, smelling them, "intuit- 
ing" them? This and related difficulties 
cannot be assessed until more content is 
given the notion of a psychological entity. 
Several suggestions deserve consideration. 

The first is that Vs are visual sensations: 
private mental events or processes which 
occur within O during rod perception. Thus 
briefly stated, the suggestion remains ob- 
scure, since we have not yet given any in- 
stances of visual sensations or any reason to 
believe they exist. The obscurity can be 
removed by offering visual afterimages as 
familiar paradigm examples of visual sensa- 
tions. On this suggestion, to say of O that he 
has a visual sensation is to say that he has a 
visual afterimage or something like one. A 
clear meaning is thus given to the assertion 
that visual sensations are private. The O's 
afterimages are logically private, since itisa 
logical, not merely an empirical fact, that 
his afterimages can be perceived only by 
him. Furthermore, the mode of perception 
now becomes clear: visual afterimages and 
whatever is like them are seen. 

A second suggestion distinguishes between 
psychological and physical magnitudes, but 
not between psychological and physical 
entities. The psychological length of rods is 
their length as perceived by 0; their physical 
length is their length as measured by E. Vi 
and 4, refer, not to different entities, but to 
different aspects of the same entity. As it 
stands, this suggestion is unclear, since we 
still do not know what “length as perceived 
by O" means. One way of removing the 
unclarity is to suggest that perceived length 
is the length in his visual field of rods seen 
by O. (Other ways of removing the unclarity 
take us into the behaviorist interpretation.) 
This suggestion preserves the logical privacy 
of psychological magnitudes, since it is a 
logical fact that only O can perceive the 
length in his own visual field of a rod. It does 
not preserve the privacy (nor the separate- 
ness) of psychological entities. In addition, 
the suggestion implies that psychological 
length is perceived by sight. 
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A third suggestion holds that magnitudes 
are magnitudes of physiological processes 
occurring within O during rod perception. 
These may be specified as retinal processes 
(length or area of retinal stimulation, retinal 
electrical potential, etc.); as optie nerve 
processes (frequency of nerve impulses, 
number of activated fibres, ete.) ; or as brain 
processes (area of stimulation in the occipital 
lobes, electrical potential in the lobes, etc.). 
It is not as strange as it may seem to classify 
this suggestion as introspectionist. For it 
contends, like the other two, that V magni- 
tudes are privately observed by O. Unlike 
the others, it does not make clear by what 
faculty Vs are perceived. Are optic nerve 
impulses felt? Are retinal processes seen? 
Furthermore, physiological processes, unlike 
visual sensations, are contingently rather 
than logically private. Although it may in 
fact be true that only O perceives the retinal 
processes occurring to him, these can be 
observed by other perceivers, either now or 
in the future, by means of suitable instru- 
ments. 

Each of the three suggestions above must 
be rejected for the same two reasons. 

First, there is no phenomenological evi- 
dence whatsoever that O observes during M 
any of the Y magnitudes suggested. As re- 
gards the first suggestion, no afterimages are 
induced in O; he sees none of these, nor 
anything like them. The complainant who 
says that an afterimage is a poor paradigm 
for a visual sensation must produce a better 
one, on pain of leaving the notion of a visual 
Sensation wholly obscure. In defense of the 
Second suggestion, it may be said that any 
Tod seen by O must have a length in his 
visual field. Even if we concede this less than 
clear contention, still there is no reason to 
believe that O is aware of every—or even of 
any—rod’s length in his visual field. He 
makes no reports of and seems to pay no 
attention to such magnitudes. From the ob- 
Jector who complains that length in the 
visual field is a poor paradigm for the notion 
of apparent length, we must demand a better 
one, else the notion of apparent length will 
remain entirely obscure. As for the third 
Suggestion, one way of emphasizing that O 
perceives no retinal, nerve, or brain processes 
is by pointing out that an observer who had 


never heard of retinas, optic nerves, or 
brains could function quite as well in Experi- 
ment M as a professional physiologist. Our 
general critieism in this paragraph can be 
reinforced by contrasting the sort of instruc- 
tions actually given in M ("Select the rod 
which is half as long as the one I now hold 
up") with the instructions which would be 
required in any of the suggestions mentioned 
(e.g., "Select the rod which has a size in your 
visual field half that of the rod I now hold 
up"). The new instructions would produce 
entirely different experiments. 

Secondly, Experiment M cannot be con- 
strued as a procedure for measuring the Ys 
mentioned in any of the three suggestions. 
The standard procedures for measuring the 
size of an afterimage, or the size in the visual 
field of a rod, require O to view the image or 
rod against a screen at a fixed distance and 
and to indicate the area on this screen 
occluded by the image or rod. The Æ can 
then express the size of the afterimage, or 
the size in O's visual field of the rod, in 
terms of the size in centimeters or inches of 
the occluded area. No screen is used in Ex- 
periment M, and no determinations of oc- 
cluded area are made. It is even more ob- 
vious that M is not a procedure for measur- 
ing physiological processes. The area of 
retinal stimulation is measured by applying 
an electroretinoscope, or, less directly, by 
computations based on the construction of 
the eye and the laws of optics. Instrumental 
and computational methods are also em- 
ployed in measuring nerve impulses, and 
brain processes. None of these methods is 
found in M. 

The foregoing analysis shows that the 
concept of a psychological entity is unclear 
and scientifically unacceptable. When we try 
to understand the concept by producing 
possible instances—afterimages, things in 
the visual field, physiological processes—the 
two objections just presented become ap- 
plicable. These objections are so conclusive, 
and in a way so obvious, as to make it seem 
that psychological entities could not be any 
of the things suggested. Thus the concept 
dissolves like a mist exposed to the light of 
day. And it is a mist, a vague penumbra, in 
the thinking of the introspectionist. He has 
some vague notion of psychological entities— 
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“sensations,” he usually calls them—located 
in a private mental realm within the per- 
ceiver. But they are not afterimages, nor 
things in the visual field, nor physiological 
processes, all of which are measurable. Re- 
garded in this obscure fashion, "sensations" 
are probably unintelligible, and certainly 
outside the reach of scientific investigation 
and measurement. The O's sensations cannot 
be measured by E because of their privacy. 
But neither can they be measured by O, 
since they are not the sort of entities to 
which the operations required in measure- 
ment can be applied. Measurement may be 
defined as a procedure for assigning numerals 
to a class of objects by an operation of com- 
paring the objects with a unit or units. For 
physical length the operation of comparison 
is laying a ruler alongside the object, for 
physical weight it is placing the object on a 
balance. What operation of comparison with 
a unit can O apply to his sensations? 

Faced with this problem, the introspec- 
tionist may concede that the mak scale does 
not qualify as sensation measurement, on the 
above definition of that term, but insist that 
the definition is too narrow. Measurement, 
he may say, is amy assignment of mumerals 
to a class of objects which represents the magni- 
tude ratios of the objects. This definition tells 
us that it is the result and not the manner 
of numeral assignment which is important. 
Numerals may be assigned by means of an 
operation of comparing objects with a unit, 
or in some quite different manner, like that 
in Experiment M. As long as the result is an 
assignment which represents magnitude ra- 
tios, measurement can be said to have taken 
place. 

The content of this reply was suggested 
to the writer by Stevens’ discussions of the 
nature of measurement. Stevens (1959a, p. 
24) maintains that measurement is “the 
assignment of numerals to aspects of objects 
or events according to rule.” On the basis of 
four different rules for numeral assignment 
he distinguishes four types of scales: nominal, 
ordinal, interval, and ratio (Stevens, 1951, 
1959a). But the rules mentioned describe 
only the result of numeral assignment. Thus 
the rule for an interval scale is: Assign 
numerals so as to represent equal intervals. 
The rule for a ratio scale is: Assign numerals 


so as to represent equal ratios. Stevens 
(1951, pp. 28-29) denies that a physical 
operation of addition is required to create 
even a ratio scale. And he (Stevens, 1959b, 
pp. 614-615) says that Fechner’s mistake 
was in believing that “measurement must be 
reducible to counting [the constituents of 
sensation]." All this seems to suggest that 
any method which produces a ratio scale (or 
any of the other types) is properly under- 
stood as measurement, which is just to say 
that it is the result and not the manner of à 
procedure of numeral assignment which 
makes it one of measurement. (It should be 
noted that Stevens officially subscribes to a 
behaviorist position and that he would 
probably deny vehemently that his theory of 
measurement suggests an introspectionist 
interpretation of psychological scaling. 
Nevertheless, see the extensive discussion of 
his writings in a later section.) 

Whether this reply can or cannot be forced 
on Stevens, it is important to formulate it 
and to see that it is unacceptable, unaccept- 
able because it confuses sensation estimation 
with sensation measurement, and illegiti- 
mately substitutes the one for the other. An 
analogy is helpful in presenting this critical 
distinction. Suppose that O sees from a dis- 
tance the shadows cast by rods on a wall, 
but that Æ can neither see nor apply his 
meter stick to the shadows. Wishing never- 
theless to measure them, and believing that 
O is able to make accurate estimates of 
shadow length, E uses him in a two-part 
experiment. In each trial of the first part he 
asks O to locate the rod, 4 , which casts & 
shadow, Yı, one-fourth as long as the 
shadow, V» , cast by rod s . In each trial of 
the second part he asks O to designate therod 
which casts a shadow one-half as long as the 
shadow cast by a given rod. With these data 
in hand E constructs first a figure like Figure 
1 and then one like Figure 2, and announces 
with elation that he has measured the in- 
accessible shadows. 

It is clear, however, that shadows have 
not been measured. The shadow observer 
determines shadow length, not by measure- 
ment, but by direct estimation. The shadow 
experimenter determines shadow length by 
relying on the observer's estimates, that i$ 
to say, by indirect estimation. Similarly, 
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the "sensation" observer determines the 
magnitude of his sensations, not by measure- 
ment, but by direct estimation. And the 
sensation experimenter determines the mag- 
nitude of O's sensations by relying on the 
latter’s sensation estimates, that is to say, 
by indirect estimation. To call any of these 
procedures measurement is to violate a 
distinction which must be preserved by any 
acceptable definition of the term, the dis- 
tinction between estimates and measure- 
ments. (For more on this distinction, see 
Savage, 1963, pp. 160-162, 250-252, 300- 
302.) 


The Principle of Correspondence 


On the introspectionist interpretation this 
principle is derived from certain hypotheses 
about the nature of O's perceptual mech- 
anism. We will illustrate the derivation for 
the principle employed in obtaining Line (5) 
from Line (4). 


(Ca) If O (indirectly) estimates that 5; and d$ 
Stand in the ratio 1:2, then O (directly) estimates 
that Y; and V; stand in the ratio 1:2; 

(Cb) If O (directly) estimates that Y, and Y, 
Stand in the ratio 1:2, then v, and Y. stand in the 
ratio 1:2; 

(Cl) Hence, if O (indirectly) estimates that d; 
and 2 stand in the ratio 1:2, then Y, and v; stand 
in the ratio 1:2. 


Informally, (Ca) says that O's estimates 
of psychological ratios are the same as his 
estimates of associated physical ratios, (Cb) 
asserts that O’s estimates of psychological 
ratios are accurate, and (Cl) concludes that 
psychological ratios correspond to estimated 
physical ratios. The reader may find it help- 
ful in understanding this argument and the 
perceptual mechanism it describes, to employ 
the shadow analogy presented earlier, that 
is, to think of Ys as shadows observed only 
by O and 4s as rods which cast the shadows. 

Premise (Cb). This premise is often as- 
sumed, rarely justified. Stevens assumes it in 
à number of places (1936, p. 408; 1956, p. 
18), and so do Guilford and Dingman (1954, 
D. 395). At times Stevens seems to be at- 
tempting a justification of the premise (1951, 
Dp. 40-41; 1954, p. 30; 1956, pp. 2, 23). Let 
us ask how we may determine whether 0’s 
psychological estimates are accurate. It 
would seem that the method must be analo- 


gous to that for determining whether esti- 
mates of physical ratios are accurate. Sup- 
pose we wish to know whether O's estimate 
that a given rod is half as long as another is 
accurate. We simply measure the physical 
lengths of the two rods and then compare 
measured physical length with estimated 
physical length. If the ratio as determined 
by measurement is the same as the ratio as 
estimated by O, then O's estimate is ac- 
curate; otherwise it is inaccurate. Note that 
the rods must be measured by some pro- 
cedure which does not depend on O's esti- 
mates of rod lengths. Otherwise the accuracy 
test will be circular and not a genuine test. 

By analogy, a test of accuracy for O's 
estimates of psychological length seems to 
require a procedure for measuring psycholog- 
ical length. This procedure must be applied 
and then measured psychological length com- 
pared with estimated psychological length. 
But the procedure for measuring psycholog- 
ical length must be independent of O's esti- 
mates of psychological length, if the test is 
to be noncircular. The mak-scale procedure 
is thus precluded, since it relies on O’s psy- 
chological estimates. But what other pro- 
cedure is available? If, as it appears, there 
is none, then (Cb) is unverifiable. Let us 
recapitulate and broaden the difficulty, 
Experiment M is put forward by its intro- 
spectionist proponents as a method for 
measuring psychological length. Now the 
experiment rests on the assumption that O's 
estimates of psychological length are ac- 
curate. But this assumption cannot be 
verified without some method for measuring 
psychological length which is prior to and 
independent of M. Hence, to recommend M 
as a method for measuring psychological 
length begs the question of whether psy- 
chological length is measurable. 

Since physiological processes are only con- 
tingently, and not logically, private, the 
above objection does not apply unmodified 
to the suggestion that O estimates physiolog- 
ical processes in M. Where O's estimates of 
psychological length are construed, for 
instance, as estimates of nerve-impulse fre- 
quency, we can test their accuracy in a non- 
circular manner by connecting an oscillo- 
scope to electrodes placed on the nerve in 
question. Even so, related difficulties arise. 
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First, O's (putative) nerve-impulse estimates 
have not been tested for accuracy. So the 
assumption of their accuracy is, although not 
unverifiable, unverified. Secondly, in testing 
O's (putative) nerve-impulse estimates for 
accuracy—which must be done before the 
mak scale can be accepted—we measure 
with instruments the very magnitude which 
M is supposed to scale. M is therefore 
superfluous, since in justifying it we accom- 
plish its purpose. Thirdly, M cannot be 
construed as a method for measuring, as 
opposed to estimating, nerve-impulse fre- 
quency. The only possible justification for 
employing M to quantify a physiological 
magnitude is that the magnitude is presently 
inaccessible to existing instruments and 
physiological techniques, and that we may 
scale the magnitude by means of O's inner 
observations of it until improved instru- 
mentation and technique make reliance on 
such a second-best method unnecessary. 
But this justification clearly implies that 
Experiment M provides us only with a 
method for estimating a physiological mag- 
nitude (allegedly) observed by O and that 
measurement of such magnitudes is accom- 
plished by instruments and physiological 
techniques. The importance of the distine- 
tion between estimation and measurement 
has already been sufficiently emphasized. 
Premise (Ca). This premise says that O's 
estimates of psychological ratios are the 
same as his estimates of the associated 
physical ratios. But what is its basis? Re- 
member that O provides E with estimates of 
physical magnitudes only in Experiment M. 
He does not say “Yı is half as great as V» ," 
but rather “The length of this rod (41) is 
half as great as the length of that rod ($:)." 
Why assume that when O makes this latter 
report Y; and Y; stand in the ratio 1:2? 
Perhaps he has learned to say '$; is half as 
great as $” when Yı and Y, stand in the 
ratio 1:3 (Garner [1954, p. 74] is the only 
experimenter known to the writer who seems 
to consider such possibilities). If so, and if 
his Y estimates are accurate, then the mak 
scale in Figure 2 does not correctly represent 
W ratios. For, on this assumption, Point B 
should be placed at 33.3 on the y axis, Point 
C at 11.1, and Point D at 3.7. This would 
would produce a different scale of psycholog- 


ical magnitude, and a different psychophys- 
ical law. 

We must know precisely how O's internal 
estimates relate to his external estimates in 
order to obtain the correct curve in Figure 2. 
But no one possesses this knowledge at pres- 
ent, and there is no clear way of obtaining it. 
The introspectionist holds that Y magnitudes 
are private to the observer. How then can E 
discover which W estimates serve as a basis 
for O's ® estimates of one-half? It seems that 
the only conclusive way is by asking O: 
“When you estimate that d$; and 4; stand 
in the ratio 1:2, what estimate do you make 
of the relative magnitude of Y; and Y, ?” 
To see that the meaning of this question is 
not clear we need only to phrase it in ac- 
cordance with any of the suggestions regard- 
ing the nature of psychological entities 
mentioned earlier; that is, *When you esti- 
mate that a given rod is half as long as 
another, what estimate do you make of your 
afterimages (sizes in your visual field, areas 
of stimulation on your retina)?” Sophisti- 
cated observers, as well as naive ones, would 
be at a complete loss in the face of such ques- 
tions. 

It is important to realize that neither 
intraobserver agreement nor interobserver 
agreement can be used to determine the 
truth or falsity or (Cl). There is an inclina- 
tion to suppose that if several observers 
estimate that ©, and &, stand in the ratio 
m:n, then Y; and V; stand in that ratio. 
There is an even stronger inclination to 
suppose that if a single observer estimates on 
several occasions that 4, and ®ə stand in the 
ratio m:n, then Y; and V; stand in that ratio. 
But how are these assumptions to be justi- 
fied? It is possible for several observers to 
estimate that 4; and &, stand in the ratio 
m:n when Y^ and V; stand in the ratio m:n 
for one observer, n:o for another, p:r for à 
third, and so on. It is possible for a single 
observer to estimate that $; and &, stand 
in the ratio m:n when V; and Y^; stand in the 
ratio m:n on one occasion, n:o on another, 
p:r on a third; and so on. These possibilities 
can be ruled out only by comparing esti- 
mated physical ratios with actual psycholog- 
ical ratios. Such comparison is possible only 
if there is some way of determining psy- 
chological ratios which is independent of the 
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observer's estimates. No method of this 
type is available. Nor is one possible if 
psychological entities are logically private. 
For if they are, then every method for deter- 
mining their actual magnitude must rely on 
the observer's estimates of their magnitude, 
and is therefore viciously circular. 

It will be said that misleading estimates 
like those imagined above occur only where 
the observer attends to the stimulus rather 
than the sensation, that is, where he com- 
mits the "stimulus error", and that they 
can be avoided by giving observers instruc- 
tions to attend only to the sensation. But 
how can E be certain that O has followed 
these instructions? Again, it is impossible 
without an adequate, noncircular method for 
comparing psychological magnitudes with 
estimated physical magnitude. Whatever the 
instructions to the observer, intraobserver 
and interobserver agreement provide no 
better grounds for thinking that O's esti- 
mates accurately reflect psychological length 
ratios than they do for thinking that his 
estimates accurately reflect physical length 
ratios. Consequently, statements like the 
following are erroneous: 


In scaling experiments we are forced to assume 
the uncontrolled ability of the subject accurately 
to report his sensations . . . the reproducibility of 
the data upon repetition of [the] experiment lends 
Some support to this assumption [Galanter, 1962, 
p. 142]. 


The Purpose of Ratio Scaling 


What, on the introspectionist interpreta- 
tion, is the purpose of constructing the mak 
scale of psychological length? To some the 
answer may seem obvious. Many, if not all, 
empirical scientific laws are statements re- 
lating two or more variables which are de- 
fined independently of one another. And those 
laws which give us the mathematical rela- 
tion are obviously more useful than those 
which do not. For instance 


(6) V=kT 


is the physical law relating the volume and 
the temperature of a gas where pressure is 
held constant. No further justification is 
needed for measuring pressure and tempera- 
ture than that it makes the formulation and 
Verification of (6) possible. And the attempt 


to discover (6) needs no justification. Simi- 
larly, Psychophysical Law (5) relates two 
variables, except that one of these is psy- 
chological. (5) tells us how psychological 
length varies with physical length when the 
experimenter’s instructions to the observer 
as well as certain other elements of the per- 
ceptual situation are held constant. Measur- 
ing physical length, by a meter stick, and 
psychological length, by the mak scale, are 
adequately justified by pointing out that 
they make the discovery of (5) possible. 

The above argument is dubious, first of 
all, in its contention that the attempt to 
discover correlations between two or more 
variables needs no justification. Surely some 
correlations are more useful than others, and 
surely some are completely useless. But let 
this pass. The more important criticism for 
our purpose is that psychological length does 
not appear to be a magnitude like pressure 
or temperature. Pressure and temperature 
are observable, “empirically real” magni- 
tudes, whereas psychological length does not 
appear to be so. In addition, pressure is 
defined independently of temperature, and 
vice versa; but psychological length does not 
appear to be definable independently of 
physical length. These points will become 
clear in the discussion of the behaviorist 
interpretation to follow. 

A second introspectionist justification for 
measuring psychological length points out 
that (Ca)-(Cb)-(Cl) is a theory of O's per- 
ceptual mechanism, a theory designed to 
explain how O visually estimates length; and 
that Psychophysical Law (5), which is the 
mathematical completion of this theory, de- 
pends on the mak scale. The argument for 
this theory of O's perceptual mechanism is 
that failure to accept it will leave unex- 
plained O's ability to make reliable estimates 
of physical length. This argument is far 
from conclusive. 

In the first place, it is doubtful, as we have 
argued, that (Ca) and (Cb) are verifiable. 
If they are not, then the theory in which they 
figure is not scientifically respectable. Sec- 
ondly, alternative theories of the same type 
with equal explanatory power can be ob- 
tained by replacing (Cb) with some other 
assumption, by assuming that physical esti- 
mates of 1:2 are based on psychological 
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estimates of 1:3, or 1:4, or 2:3, etc. Thirdly, 
it has yet to be shown that any of the alter- 
native theories just indieated are required to 
explain O's ability to make estimates of rela- 
tive physieal length. Why must we suppose 
that O bases his estimates of physical ratios 
on estimates of psychological ratios? Why 
isn't the following explanation sufficient? 
The O's eyes are normal, and he has learned 
how rods look when they stand in the ratio 
1:2; hence, when he looks at rods he can 
say with some accuracy whether or not they 
stand in this ratio. That this explanation 
implies no perceptual mechanism like that 
in (Ca)-(Cb)-(Cl) is not enough to reject it. 

It may be well to include a warning against 
attempting an epistemological justification of 
ratio scaling. The essential contention of the 
introspectionist theory of O's perceptual 
mechanism is that O bases his (always in- 
direct) estimates of physical entities on his 
(always direct) estimates of psychological 
entities. This sort of perceptual theory is 
often subsumed under or associated with the 
philosophical, epistemological theory that 
knowers base their (always indirect) knowl- 
edge of the external environment on their 
(always direct) knowledge of private, in- 
ternal processes, these last being causal 
results or at least reflections of the environ- 
ment. This epistemological theory is clearly 
dualistie—since it posits parallel psycholog- 
ical and physical realms, and introspectionis- 
tie—since O is believed to know the internal 
realm by some process of inner perception, 
that is, by “imtrospection.” The theory 
seems both to establish our knowledge of 
the external world and to explain how we 
obtain it, and it has, as a consequence, 
attracted psychologists and philosophers for 
centuries. 

But there are problems, the most impor- 
tant of which is an instance of the sceptical 
problem of solipsism. If O directly perceives 
only internal entities, then how can he know 
that his inferences concerning the existence 
and character (physical length, for instance) 
of external entities are accurate? Some 
theorists may suppose that this problem is 
solved by the discovery of such laws as (5), 
that if O learns from E the precise relation 
between his internal entities (Ws) and ex- 
ternal entities ($s), then he can make re- 


liable inferences from the ones to the others. 
(Gibson [1950, pp. 186-187] seems to make 
some such supposition. He is properly taken 
to task for it by Price [1953, p. 410].) But 
the sceptical problem which confronts O 
also confronts E. The E also directly per- 
ceives only internal entities; consequently, 
his inferences concerning the physical length 
of the experimental rods are equally prob- 
lematic. Since Æ must determine the 
physical lengths of the experimental rods in 
order to establish Psychophysical Law (5), 
and since E's physical length inferences are 
just as problematic as O's, O cannot rely 
on the validity of (5) to support his own 
inferences concerning physical length. The 
blind cannot be led by the blind. (This diffi- 
culty is related to the “psychologist’s circle" 
which Boring [1931], Bergmann & Spence 
(1944, pp. 2-5], and others have discussed.) 
It is no reply to say that E measures the 
rods—with a meter stick, say—and thus 
insures the accuracy of his determinations of 
their length. For JE's measurements are 
accurate only if he correctly infers the 
physical lengths of meter-stick segments 
from their psychologieal lengths. And how 
can he know that these inferences are cor- 
rect? 

If, as it appears, the solipsist problem 
arises in an introspectionist but not in a 
behaviorist interpretation of psychophysical 
measurement, then this is clearly an argu- 
ment in favor of the latter. 


THE BEHAVIORIST INTERPRETATION 


The Principle of Correspondence 


It is often said that scales like the one 
constructed in Experiment M are "scales of 
observer response.” This statement is either 
imprecise or false. For it suggests that psy- 
chological magnitudes are literally magni- 
tudes of O's responses: perhaps the loudness 
of his verbal reports, or their pitch, or the 
time required to make them, etc. Obviously 
none of these suggestions is acceptable. 
There is no reason to suppose that the loud- 
ness, pitch, or time of O's verbal reports has 
any interesting or systematic relation either 
to the length of the rods or to estimates of 
their length. The behaviorist must not 
identify psychological length with some 
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magnitude of O's responses; rather, he must 
define psychological length in terms of 0’s 
responses in some useful manner. Many of 
Stevens’ (19592, p. 52) remarks suggest the 
erroneous identification, for example: 
“brightness . . . is the name for a response of 
a human organism to an external configura- 
tion of the environment.” 

Since every magnitude—psychological or 
otherwise—is defined by the relational 
terms “greater than,” “equal to,” and “less 
than,” we must, in defining psychological 
length, provide a rule for the application of 
at least these three terms. Such a rule is 
contained in Definition (i): If O estimates 
that $; is (physically) greater than (equal to, 
less than) 6, then Y, is (psychologically) 
greater than (equal to, less than) Ys. This 
statement defines psychological length as an 
intensive magnitude, but fails to give it all 
the features of the extensive magnitude 
scaled in Experiment M. To generate an 
extensive magnitude a definition of psy- 
chological ratios is also required. Hence we 
need also Definition (ii): If O estimates that 
9 and &, stand in the (physical) ratio nim, 
then Y, and Y; stand in the (psychological) 
ratio nim. (i) and (ii) can now be used, 
together with an arbitrarily chosen unit of 
psychological length, to construct the ratio 
scale of psychological length in Figure 2. 

Definition (ii) is, of course, the principle 
of correspondence encountered earlier. But 
by making the principle a definition the be- 
haviorist denies that it requires a supporting 
argument such as (Ca)-(Cb)-(Cl), a theory 
of O's perceptual mechanism. We argued 
that (Ca) is dubious and apparently un- 
verifiable, and that (Cb) is dubious and 
potentially circular. We argued that there are 
alternative theories to the one embodied in 
(Ca)-(Cb)-(CI). The behaviorist is unaffected 
by these objections, since he subscribes 
neither to (Ca) and (Cb), nor to any theory 
of O's perceptual mechanism which posits 
internal estimates as the basis for external 
estimates. He simply begins with the fact 
that O's responses to rod length are reliable, 
and then goes on to use the mak scale as a 
Means for expressing mathematically the 
relation between those estimates and actual 
rod length. 

Though immune to some objections, the 


behaviorist interpretation of the principle of 
correspondence may be open to others. If 
the principle is merely a definition, then why 
should we accept it? The reply cannot be 
that O's estimates of ® ratios are based on 
his estimates of Y ratios and his estimates of 
W ratios are accurate. This would be to fall 
back on the introspectionist view. Nor can 
the behaviorist maintain that (ii) is a natural 
definition, that is to say, a definition of 
ordinary terms, like “A vixen is a female 
fox.” Psychological length is a technical 
notion which the behaviorist psychophysi- 
cist introduces for extraordinary purposes. 
(ii) must, therefore, be regarded as a stipu- 
lative definition, which can be supported 
only by appealing to its scientific usefulness. 
Whether it is in fact useful will be discussed 
in the sequel. 

(i) and (ii) are examples of what are some- 
times called “operational definitions” of a 
psychological magnitude. Most behaviorist 
psychophysicists call for such definitions, 
assure the reader that they can be provided, 
and then fail to provide them. Stevens (1935) 
illustrates this sort of malpractice. Goude 
(1962, pp. 28-29) may be an exception. In 
failing to provide explicit statements of the 
definitions, the psychophysicist risks over- 
looking several troublesome but extremely 
important questions concerning the be- 
haviorist interpretation. One such question 
is: Are there as many types of psychological 
magnitude as there are types of estimate? 
Are psychological magnitudes constructed 
from different types of estimate incompar- 
able? 

Suppose O estimates in a halving experi- 
ment for length that 


(7) = We and & = Kð, 


and in an experiment requiring length esti- 
mates of one-third, he says that 


(8) $ = Mb. 
Together with Definition (ii), (7) entails that 
(9) Y, = lv, and Y: = lóW,. 


Together with certain elementary rules of al- 
gebra, (9) entails that 


(10) Y, = Ys. 
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On the other hand, (8) together with Defi- 
nition (ii) entails that 


(11) Y, = y. 


Apparently (10) and (11) are inconsistent; 
consequently, the estimates in (7) and (8) 
apparently conflict. The behaviorist may 
try to remove the inconsistency by main- 
taining that (a) Definition (ii) is not appli- 
cable to all O's fractionations, (b) the ele- 
mentary laws of algebra do not apply to 
psychological magnitudes, (c) O is mistaken 
in some of his estimates of psychological 
length, or (d) (10) and (11) describe two 
different psychological magnitudes. (a) is an 
unacceptable solution, since it undermines 
the definition which makes the construction 
of ratio scales of psychological length 
possible. (b) is also unacceptable, since it 
implies that the numerals assigned to psy- 
chological entities in experiments like M do 
not represent ratios. (c) is completely out of 
the question, since it carries the implication 
that psychological entities are observable, 
and thus plunges us back into an introspec- 
tionist framework. Only solution (d) re- 
mains. 

The estimates on which Figure 1 is based 
do not lead to the sort of inconsistency il- 
lustrated above. But there is no assurance 
that all or even most actual experiments will 
be like M in this respect. It is always pos- 
sible—indeed, it is likely—that O will make 
conflicting estimates. What the behaviorist 
is forced to say about conflicting estimates 
shows that, even in those experiments which 
contain no conflicts, different fractional 
estimates create different psychological mag- 
nitudes. If this is so then it is wrong to 
suppose that in Figure 2 Points A-E and 
Points a-c lie on the same psychological 
continuum. The two sets of points do lie 
along the same line. But this is nothing 
more than an artifact of particular estimates 
made by a particular observer. The same 
observer at another time, or another ob- 
server, may easily produce conflicting esti- 
mates. Then the two sets of points will not 
even lie on the same line. 

Where we compare ratio estimates with 
interval estimates the “operationist” fea- 
tures of Definition (ii) become even more 
obvious. Suppose O halves three rods as in 


(7), and then estimates length intervals for 
the same rods by saying that 


(12) 


Together with a principle of correspondence 
for interval estimates, (12) entails that 


(13) 


Apparently (9) and (13) are inconsistent; 
consequently, the estimates in (7) and (12) 
apparently conflict. The behaviorist may 
try to remove the inconsistency with solu- 
tions analogous to (a), (b), or (c). Those 
solutions will be unacceptable for similar 
reasons. The conflict can be acceptably re- 
moved only by maintaining that (9) and (13) 
describe different psychological magnitudes, 
different psychological lengths. 

In brief, if we hold that principle of cor- 
respondence (Cl) and other similar prin- 
ciples are stipulative definitions, then we 
seem forced to admit that there are as many 
types of psychological length as there are 
types of estimate of physical length, that 
there is no single continuum called “psy- 
chological length.” If this is so, then no 
scale of psychological length can conflict 
with any other. A ratio scale constructed 
from m/n estimates cannot conflict with a 
ratio scale constructed from n/o or p/r 
estimates. Hence, Campbell (in Ferguson, 
et al, 1939-40, p. 338) is mistaken in his 
criticism of ratio-scale measurement. He 
argues that if a loudness scale is constructed 
from estimates of one-half, then “a sone will 
not be estimated as a tenth of 10x sone.” 
What then, he wonders, is the advantage of a 
figure like our Figure 2 over one like Figure 
1? “Why do not psychologists accept the 
natural and obvious conclusion that sub- 
jective measurements of loudness in nu- 
merical terms (like those of length or weight 
or brightness) are mutually inconsistent and 
cannot be the basis of measurement?” The 
behaviorist ought to reply that one-half 
scales and one-tenth scales are scales of 
different psychological loudnesses, and can- 
not be inconsistent. 

More obviously, a ratio scale constructed 
from fractionation estimates cannot conflict 
with a partition scale constructed, say, from 
equisection estimates; and neither of these 


$,— $: = $ — di 


Y, — VY, = V — Wi. 
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can conflict with a jnd scale. It may be sup- 
posed that since none of the three curves in 
Figure 2 coincides with any of the others, 
only one can represent a “valid” scale of 
psychological length. But this is true only 
if the ratio scale, Sz, the partition scale, 
Sp, and the jnd scale, Sy, are all scales of 
one and the same psychological magnitude, 
Y, . Now according to the behaviorist, S; , 
Sp , and S; are scales of different psycholog- 
ical magnitudes, V, , Ws, and Yc, respec- 
tively. Hence, the three seales do not com- 
pete with one another, and the fact that they 
do not coincide raises no question about 
their validity. 

Discussing the lack of correspondence 
between ratio and partition scales, Stevens 
(1960c, pp. 52-53) says: “... observers are 
so constituted that they are unable to par- 
tition a prothetie continuum without a 
systematic bias.” He tries to explain this 
fact by suggesting that the observer’s sensi- 
tivity is not uniform on the scale, being 
greater in the lower ranges. To say that 
partition estimates exhibit a systematic bias 
implies that the partition seale and the scale 
with which it is being compared are scales of 
a single psychological continuum. More pre- 
cisely, the implication is that Sp and Sp are 
Scales of W, , that Sp and Sp do not coin- 
cide, and that Sp is the “true” scale. The 
behaviorist reaction is that Sp and Sp are 
scales of different psychological magnitudes 
and therefore do not compete. The one is as 
"true" a scale as the other; so the estimates 
producing the one are no more biased than 
the estimates producing the other. 

Stevens sometimes takes a different tack. 
At one point (1959c, p. 996) he writes that 
the question of scale validity is “a matter of 
opinion,” that “a judgement about validity 
always reduces ultimately to a value judge- 
ment,” and that “in the long run... it is 
the scientific community that will decide 
the issue.” There is no issue for the be- 
haviorist, since different types of psycholog- 
ical scale are scales of different psychological 
magnitudes and do not compete for valid- 
ity. At other points, Stevens adopts this 
"operationist" point of view. 


Since the three kinds of scales are nonlinearly re- 
lated on prothetic continua, it seems clear that 


they must measure different things. Each is proba- 
bly a valid scale of something [1959c, p. 998]. 


Speaking of the three different types of scale 
for subjective finger span, he says: 


Obviously, three different aspects of finger span 
are being measured by these three functions. Al- 
though a certain amount of argument has revolved 
around the question of which of these functions is 
the “true” scale, it should be apparent that all 
three are true scales of something or other [Stevens 
& Stone, 1959, p. 94]. 


It is impossible to locate Stevens’ view pre- 
cisely, since he constantly shifts back and 
forth between a behaviorist and some other 
way of treating the question of scale validity. 


The Nature of Psychological Entities 


According to the behaviorist, the Ws de- 
fined in (i) and (ii) are not observed by F, 
O, or anyone else. This harmonizes the inter- 
pretation with the phenomenological facts of 
Experiment M, which are as follows. The O 
sees rods and walls, feels chairs and tables, 
hears voices, and so on. But he does not 
observe, through some mysterious faculty of 
perception, psychological entities of which 
psychological length is a magnitude. The O 
observes by sight the experimental rods, and 
he estimates their relative physical length. 
The E measures the physical length of the 
rods, records O's estimates, and constructs a 
psychological magnitude in accordance with 
Definitions (i) and (ii). Neither O nor E 
observes psychological entities or magni- 
tudes. 

This feature of the view has an important 

bearing on the question of observer accuracy. 
Psychophysicists often say that experiments 
like M presuppose the ability of the observer 
to make accurate estimates of psychological 
magnitudes. Thus Stevens (1951, pp. 40-41) 
says: 
[In fractionation procedures] we make an assump- 
tion that calls for scrutiny. We postulate, among 
other things, that the subject knows what a given 
numerical ratio is and that he can make a valid 
judgement of the numerical relation between two 
values of a psychological attribute. 


Such a statement can be understood only in 
the context of an introspectionist interpre- 
tation of scaling experiments. As a previous 
section has shown, the introspectionist bases 
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principle of correspondence (Cl) on the as- 
sumption, (Cb), that O's estimates of psy- 
chological magnitude are accurate. But the 
behaviorist maintains that O does not make 
estimates of psychological magnitudes, that 
O's estimates are of physical magnitudes like 
length, weight, etc. Now it is clear that Ex- 
periment M does not presuppose O's ability 
to make accurate estimates of physical 
length. Indeed, O's length estimates are in- 
accurate, which is typical of experiments of 
this sort. And E will often give O explicit 
instructions to provide naive estimates, to 
make no attempt at being accurate. Thus 
Garner tells his subjects: “Remember to try 
to assign numbers according to how loud 
the tones appear to you. We are interested in 
how loud tones seem to be to you, not in 
some kind of accuracy” (quoted by Stevens 
[1956, p. 17]). In sum, on the behaviorist 
view, O makes no estimates of psychological 
magnitudes, and the estimates which he does 
make—estimates of physical magnitudes— 
are not required or presupposed to be ac- 
curate. 

There are also important implications for 
a related question, that of the so-called 
“stimulus error.” As the term was originally 
introduced by Titchener (1905, p. xxvi), to 
commit the stimulus error is (a) “to confuse 
sensations with their stimuli,” “to read the 
character of the stimuli into the ‘sensa- 
tions’.” As Boring (1921, p. 451), who re- 
viewed the history of the notion, put it: 
“We commit the stimulus-error if we base 
our psychological reports upon objects rather 
than upon the mental material itself, or if, 
in the psychophysical experiment, we make 
judgements of the stimulus and not judge- 
ments of sensation”. However, the term 
has been used in a different, or at least 
extended way. Stevens (1959e, pp. 1002- 
1003), in criticizing the physical correlate 
theory, characterizes it as holding that “all 
quantitative estimates of sensory magnitude 
are really based on some form of ‘stimulus 
error’.” The theory under attack was de- 
vised by Warren (1958) and Warren, Sersen, 
and Pores (1958), who put it to experimental 
test with loudness. (See Warren & Warren 
[1963, pp. 804-808] for a further discussion 
of the theory.) They maintain that, knowing 
little or nothing about the magnitudes of 


sound waves, typical observers base their 
estimates of loudness on the (estimated) 
distance of the sound source. Now, to com- 
mit this error is not to confuse sensations 
with their stimuli, but rather (b) to confuse 
one stimulus with another. Thus we have a 
different meaning of the term “stimulus 
error." 

The behaviorist may with perfect con- 
sistency attack the physical correlate theory. 
He may regard it as an error to confuse one 
stimulus with another, and he may attempt 
to show by experiment that observers do not 
commit the error. (He may also, again with 
perfect consistency, attempt to show that 
observers do commit the error when esti- 
mating loudness, brightness, etc. But what, 
in Sense [b], would the stimulus error be for 
length, weight, etc?) The behaviorist may, 
therefore, attack the stimulus error in Sense 
(b). But he may not attack it in Sense (a). 
For on his view, typical observers do not 
commit any error when they make estimates 
of the stimulus in psychophysical experi- 
ments. That is precisely what they are 
asked by E to do. Indeed, the behaviorist 
cannot even admit the possibility of stimu- 
lus error in Sense (a). To commit the error 
in that sense is to be aware of the stimulus, 
to be aware of the sensation caused, and to 
"read" the former into the latter. But the 
behaviorist holds that observers are not 
aware of any sensations caused by rods, 
weights, etc. in psychophysieal experiments 


dealing with these stimuli. The behaviorist ' 


experimenter may desire that his observers 
estimate length, weight, etc. “as they see 
it," “as they feel it," etc., and he may give 
them instructions to that effect. But these 
will, nonetheless, be instructions to estimate 
the stimulus. 

One decided advantage in the behaviorist 
interpretation is its removal of private 
entities from the conceptual structure of 
Experiment M. By definition, a publie 
entity can be observed by any normal ob- 
server, a private entity by only one ob- 
server. Hence, the distinction between public 
and private entities applies only to observ- 
able entities, like rods and afterimages, and 
not to the unobservable Ws of the behaviorist 
interpretation. It follows that we must not 
say that Ws are public. But, when we recog- 


OC NS 


Interpretations or Ratio SCALES 17 


nize the parallel mistake in saying that they 
are private, there is no longer any inclination 
to regard them as directly inaccessible to 
everyone but O, or to think of O as an inter- 
mediary between the investigating scientist 
and a realm of private data. The dualism 
between publie and private data entirely 
collapses, and, as a result, several disturbing 
theoretieal and philosophical problems sim- 
ply vanish. Psychological entities are indeed 
unobservable; nevertheless, since they are 
defined in terms of publicly observable rod 
presentations and observer reports, there is 
no problem about the general availability of 
psychological data or “objectivity” of results 
based on these. 

According to the behaviorist, Ys are, to 
use a current phrase, theoretical constructs. 
This view is suggested vaguely by Gamer 
(1954, p. 88) and incompletely by Stevens 
(1960a, p. 27). That is, they are theoretical 
entities constructed by E by means of Defi- 
nitions (i) and (ii) for scientific uses. An 
analogy will help to explain their nature. 
The O is asked in a two-part experiment to 
estimate the dollar value of automobiles. In 
each trial of the first part, E presents O with 
a standard auto and asks him to select a 
comparison auto one-fourth as expensive. In 
each trial of the second part O is asked to 
select comparison autos which are one-half 
88 expensive as the standards. A plot of the 
data thus obtained produces lines identical 
to those in Figure 1, except that numerals 
along the axes represent auto values in 
thousands of dollars. The E now invokes a 
principle of correspondence, which says: 
If O estimates that $; and ©, stand in the 
ratio 1:2, then Y, and Wz stand in the ratio 
1:2, where és represent the real value and 
Ws the estimated value of automobiles. The 
E stipulates that one unit of estimated value 
equals 1,000 units (dollars) of real value, 
and, using the principle of correspond- 
ence above, constructs a scale of estimated 
value, plotting estimated against actual 
value by the same method used to obtain 
Figure 2, 

1 It is clearly a mistake to attempt to 
identify the Ws in our analogy with some 
group of privately observable entities, of 
which estimated value is a magnitude. This 
attempt will lead to theories like the follow- 


ing. When O estimates the value of a $5,800 
automobile he has thought, Y;; when he 
estimates the value of a $10,000 automobile 
he has another thought, W». Since WY, is 
(psychologically) half as great as Ya, O 
estimates that the values of the two auto- 
mobiles stand in the ratio 1:2. This theory 
is specious, firstly, because no meaning can 
be given to the assertion that one thought is 
less great or greater than another; and, 
secondly, because it is an obvious fact that 
O makes estimates, not about his thoughts, 
but about his automobiles. Similarly, it is an 
introspectionist mistake to try to identify 
the Ws of the mak scale with privately ob- 
servable sensations. Such identification com- 
mits us to the unclear and possibly meaning- 
less view that sensations can be greater or 
less than one another; and it implies, falsely, 
that O makes estimates of sensations. 

It is also a mistake to ask for examples of 
the entities of which estimated value is a 
magnitude. Estimated value cannot be con- 
strued as a magnitude of O's utterances 
(their loudness, time, ete.), nor of automo- 
biles, nor as a magnitude of any other ob- 
servable group of entities. Analogously, it is 
a mistake to ask for examples of the entities 
of which psychological length is a magnitude. 
Behind this request lies the introspectionist 
belief that Ys can be ostensively defined and 
are thus observable entities, examples of 
which can literally be pointed to by some- 
one. One completely understands the nature 
of psychological entities and their magni- 
tudes when he understands Definitions (i) 
and (ii). To ask for examples, after having 
earefully studied the definitions, betrays 
misunderstanding. 

In view of the above considerations, it is 
tempting to say that on the behaviorist view 
psychological entities are fictions and do not 
exist. Although true in one sense of "exist" 
(the sense in which observables exist), this is 
apt to be misleading, to suggest that the 
psychophysicist is trying to measure noth- 
ing. It is more helpful to point out that the 
concept of psychological magnitude is the 
product of a facon de parler. Consider again 
our analogy. When we say, (a): “The esti- 
mated value of the one automobile is half 
the estimated value of the other," we do not 
wish to ereate the impression that automo- 
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biles have two sorts of value, estimated and 
actual value, in the way that they have 
ballast value on a ship as well as financial 
value on the market. (a) is just another way 
of saying, (b): “O estimates that the value 
of the one automobile is half the value of the 
other.” (a) means nothing more than what is 
meant by (b), and (b) does not imply that 
estimated and actual value are two sorts of 
value possessed by automobiles. Only money 
value is involved, although an economist or 
someone else may be interested in O's esti- 
mates of this money value. (Other examples: 
“alleged” and “confirmed” age are not two 
sorts of age, nor “probable” and *'deter- 
mined" area two sorts of area, nor ‘“‘ap- 
parent" and factual" cause two sorts of 
cause.) Analogously, when we say, (c) “The 
estimated length of the one rod is half the 
estimated length of the other,” we do not 
imply that estimated length and actual 
length are two sorts of length possessed by 
rods. (c) is just another way of saying, (d) 
“O estimates that the length of the one rod 
is half the length of the other.” And (d) 
carries no implication of two sorts of length. 
Rods possess one sort of length: physical 
length, if you will. But the psychophysicist 
is interested in the estimates which observers 
make of the physical length of rods. The E 
may express facts about such estimates by 
speaking of “estimated length” as opposed 
to “actual (real) length." But it is clear that 
such talk is merely a facon de parler which 
E finds it convenient to use in describing the 
behavior of his observers. 

It is of interest to bring our analogy to 
bear on Stevens’ (1959a, pp. 50ff.) suggestion 
that utility can be measured by the scaling 
procedures used for perceptual magnitudes. 
He describes an experiment by Galanter in 
which O is given instructions like the follow- 
ing: "Suppose I were to tell you that I am 
going to give you $10. That would make you 
happy, would it not? All right, now think 
this over carefully. How much would I have 
to give you to make you twice as happy?” 
The O's responses are used to construct a 
scale of “utility,” or “subjective value,” 
measured in “utiles.” (Presumably the 
method is like that used in deriving Figure 2 
from Figure 1.) This experiment differs 
slightly from the one in our analogy. There 


O was estimating the dollar value of a class 
of goods. Galanter's observer makes esti- 
mates of dollars. But he does not estimate 
the dollar value of dollars. Such estimates 
would be nonsensical, unless O were esti- 
mating the new dollar value of dollars in 
terms of their old dollar value, which he is 
not. Rather, O estimates the capacity of 
money to make him happy. And the utility, 
or subjective value, of money is defined in 
terms of these estimates. Hence, to say, 
(e) *$18 has twice as much utility (subjective 
value) as $10 for O" is just to say, (f) “O 
estimates that $18 has twice the capacity of 
$10 to make him happy.” Employed in this 
way, the concept of utility is the product of 
a facon de parler. The same analysis applies 
to psychological length, as the behaviorist 
employs that concept. The point of the 
facon de parler is, of course, to make the 
phenomena spoken of amenable to measure- 
ment. 


The Purpose of Ratio Scaling 


In a previous section we noted that the 
behaviorist can argue the acceptance of prin- 
ciple of correspondence (Cl) only on grounds 
of scientific usefulness. Now obviously the 
reason for accepting (Cl) is that it makes the 
construction of the mak scale possible. But 
what reason is there for constructing the 
scale? Well, its construction makes the 
formulation and verification of Psychophysi- 
cal Law (5) possible. But what reason is 
there for formulating and verifying (5)? The 
behaviorist cannot answer, in the manner of 
the introspectionist, that (5) is a law relating 
two variables, Y and 6, defined independ- 
ently of one another, and that the discovery 
of such laws needs no justification. For V 1$ 
defined in (i) and (ii) in terms of O's ® 
estimates, Nor can the behaviorist plead that 
(5) explains how O makes reliable estimates 
of physical length. (5) could provide such an 
explanation only in conjunction with a the- 
ory of O's perceptual mechanism like (Ca)- 
(Cb)-(Cl). But the behaviorist does not link 
(5) to any such theory. Consequently, he 
appears reduced to saying this: (5) does not 
tell us how O estimates physical length; 
rather it is a mathematical description of 
what O estimates physical length to be. (5) 


is (to employ some current terminology) not - 
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an explanation; rather it is a mathematical 
description of O’s estimates of physical 
length. The reason for accepting (CI) is that 
it makes this mathematical description 
possible. 

The behaviorist may complain that (5) is 
more than a mere description of O's length 
estimates, since it enables us to predict new 
data, to predict ratio estimates which were 
not made by O in obtaining the data on 
which Figure 1 is based. By using (5) we can 
predict for any ®, which ®, O would esti- 
mate to be one-half, one-fourth, ete. Now 
isn’t predictive power what distinguishes an 
explanatory law from a merely descriptive 
one? Isn’t any explanatory law, (5) or any 
other, simply one with predictive power? 
This reply suggests the view that (a) psycho- 
logical explanation consists in the discovery 
of empirical laws which describe and predict 
the behavior of organisms. The opposing 
view is that (b) psychological explanation 
consists in discovering the mechanisms, men- 
‘al or physiological, which underlie the be- 
havior of organisms. 

It is not necessary for us to ask which of 
hese two views is correct, nor to ask which 
is behaviorist and which is introspectionist. 
We will remark only that there seems to be no 
reason why a behaviorist cannot accept ex- 
planations in terms of some mechanisms, i.e., 
hose which are physiological and do not 
posit internal estimates as the basis for ex- 
ernal estimates. The solution to these 
difficult questions does not affect the differ- 
ence between the introspectionist and be- 
haviorist justifications of Psychophysical 
Law (5). If the behaviorist adopts (b), then 
he is required to say that (5) is descriptive 
and not explanatory, since it is not, on his 
view, a theory or part of a theory of O's per- 
ceptual mechanisms. If he adopts (a), then 
he may claim that (5) is in one sense an ex- 
planatory law, since it has predictive power. 
But he must deny that it is explanatory in 
the other sense. For he denies that O per- 
ceives and makes estimates concerning ex- 
ternal, physical entities by perceiving and 
making estimates concerning internal, psy- 
chological entities, Whereas, such an hypoth- 
esis is essential for (5) to be a theory or part 
of a theory of O's perceptual mechanism. 

If the introspectionist justification cannot 


be adopted by the behaviorist, then what, on 
the latter’s view, is the purpose of ratio scal- 
ing? That Psychophysical Law (5) can be 
used in describing and predicting O's frac- 
tionation estimates may appear to be suffi- 
cient reason for employing any scaling 
procedure which enables us to formulate and 
verify the law. But is this so? (3) can be used 
in describing and predicting O's one-fourth 
estimates, (4) in describing and predicting 
O's one-half estimates. Other laws of the 
same type can be used in describing and pre- 
dicting other fractionation estimates. None 
of these laws presupposes the concept of 
psychological length. It is obviously useful 
to construct the lines and discover the equa- 
tions of Figure 1. But what additional pur- 
pose is served by going on to construct the 
line and formulate the equation in Figure 2, 
a construction which presupposes the con- 
cept of psychological magnitude? More gen- 
erally, the question is this: Is there any 
legitimate purpose in constructing ratio 
scales of psychological magnitude which can- 
not be equally well achieved without the 
concept of psychological magnitude? If the 
answer is negative, then the concept of 
psychological magnitude may and should 
(for reasons to be given later) be dispensed 
with. We will attempt to discredit three ar- 
guments which conclude that the concept is 
indispensable. 

First argument. One argument insists that 
without the concept of psychological magni- 
tude we would not be able to formulate the 
general psychophysical laws which have been 
advanced in the past few decades. (3), (4), 
and equations for other estimation-ratios 
can be obtained from (5), and (5) can in 
turn be obtained from them. This can be 
accomplished by using the equation 


(14) a = b, 
or, logarithmically, 
(15) n = log a/log b, 


where a is the fraction (one-fourth, one-half, 
ete.) O is required to estimate, b = 4,/d, , 
and n is the exponent in an equation like (5). 
Thus it is possible to perform the useful op- 
eration of predicting the results of new frac- 
tionation experiments. (Ekman [1958, p. 288] 
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and Treisman [1964b, p. 387] both suggest an 
algebraic method which is essentially the 
same as that offered here.) In view of the 
algebraic relations above, (5) may be con- 
sidered as a kind of general law, of which (3), 
(4), and certain other fractionation equations 
are, in a sense, instances. Now it may be 
supposed that such general laws must take 
the form of (5), that is, must relate some 
psychological magnitude to a stimulus mag- 
nitude. 

This claim appears to be false. Given (14), 
we can write 


(16) b= Wa, 


and can then substitute into the equation, 
©, = bó, , to obtain: 


(17) &, = Vad.. 


Where n has the value determined in Experi- 
ment M, (8) becomes 


(18) $, = 22/9 6,. 


(3), (4), and all the other fractionation equa- 
tions which can be obtained from (5) by the 
method in the previous paragraph can also be 
obtained from (18) by the same method. And 
from every fractionation equation from 
whieh we can obtain (5) by the algebraic 
method in the previous paragraph we can 
also obtain (18) by the same method. Thus 
(18) is as general, and as useful in prediction, 
as is (5); and yet it does not employ the con- 
cept of psychological length. 

Since the above argument assumes that all 
fractionation equations are linear, its con- 
clusion is not a general one. The general 
claim is as follows: Given any equation, 
W = F(®), from which we can obtain and 
which can be obtained from any of the frac- 
tionation equations, $, = f(5j), &, = 
Jel), +++ Be = f$), by the method 
described above; there is another equation, 
$, = G(®,), which we can obtain from and 
which can be obtained from any of the same 
fractionation equations by the same method. 
If this claim is true, then the most general 
psychophysical laws can be formulated with- 
out employing the concept of psychological 
magnitude. The G equation is of an entirely 
different type than the F equation, since it 
contains only physical variables. But it is, 


nonetheless, a psychophysical equation, an 
equation which can be used to describe and 
predict O’s ratio estimates of physical 
length. 

Second argument. This argument main- 
tains that without the concept of psycho- 
logical length we would be unable to make 
the comparisons made in Figure 2 between 
ratio scales, partition scales, and jnd scales, 
Thus stated the argument is trivial and 
question begging. Naturally, if we do not 
construct a scale, S5, of psychological 
length, then we cannot compare it with 
scales Sp and S; , since there is nothing with 
which to make the comparison. The question 
is: Why should we construct any scales of 
psychological length? There may indeed be 
good reasons for constructing psychological 
scales of some kind. But is there'any good 
reason for constructing scales of psychological 
magnitude? Why even introduce the concept 
of psychological magnitude? The answer will 
be that if we do not we cannot compare ratio 
estimates with interval estimates, nor either 
of these with jnd estimates. 

This argument is unconvincing, since there 
seem to be other ways of comparing the 
various types of estimate. Let us illustrate 
an alternative method by comparing jnd 
estimates with ratio estimates of one-half. 
The upper line in Figure 2 is the result of 
plotting number of jnd’s against rod length. 
By using this line we can plot in Figure 1 the 
length of the stimulus associated with n jnd’s 
(ordinate) against the length of the stimulus 
associated with 2n jnd’s (abscissa). For ex- 
ample, Figure 2 shows that the stimuli asso- 
ciated with 20 and 40 jnd's are 5 and 11 cm., 
respectively. Accordingly, we place a point 
at 5 on the y axis and 11 on the x axis of 
Figure 1. Arbitrarily selected points ob- 
tained in this manner are connected by the 
lower of the two dash lines in Figure 1. That 
the dash line does not coincide with Line (4) 
presumably shows that O does not estimate 
length ratios of one-half on the basis of 
length jnd’s. (If coincidence had been the 
result, would this have shown that O does 
estimate length ratios of one-half on the 


basis of length jnd’s?) Similarly, we can com- ` 


pare jnd estimates with ratio estimates of 
one-fourth by plotting in Figure 1 the length 
of the stimulus associated with n jnd's 
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against the length of the stimulus associated 
with 4n jnd’s. 

We can use this same method to compare 
partition estimates with ratio estimates, so 
long as the partition interval is relatively 
small and the smallest stimulus estimated is 
near physical zero. The numerals on the in- 
side left-hand ordinate of Figure 2 can be 
taken to represent merely the number of 


apparently equal intervals associated with a: 


given stimulus, just as the numerals on the 
outside right-hand ordinate represent the 
number of jnd intervals associated with a 
given stimulus. Then we may plot in Figure 
1 the length of the stimulus associated with n 
apparently equal intervals against the length 
of the stimulus associated with 2n appar- 
ently-equal intervals. If we do this we obtain 
the upper of the two dash lines in Figure 1. 
That this dash line does not coincide with 
Line (4) presumably shows that O does not 
estimate length ratios of one-half on the 
basis of equal-appearing intervals. 

It is important to understand that in our 
method of comparison, the ordinate num- 
erals of the category scale and jnd scale 
represent, not the size of intervals along a 
psychological continuum, but merely the 
number of intervals associated with various 
stimuli. Our method makes the harmless 
and probably trivial assumption that jnd’s 
are similar in some respect, and that ap- 
parently equal intervals are similar in some 
respect; but it does not assume that jnd 
intervals, or apparently equal intervals, are 
equal on some psychological continuum. 
(Similarly, to count off the number of 
octaves in the musical scale assumes that 
octaves are similar in some respect, but not 
that they are equal on a psychological 
continuum.) If the ordinate numerals repre- 
sent equal intervals on a psychological con- 
tinuum, then the following principle of 
correspondence is presupposed: If O esti- 
mates that 6, — $, = &, — &, then Y; — 
V, = V, — Y; . But our method presupposes 
no such principle. We can speak of the 
number of similar intervals without em- 
ploying the concept of a psychological 
entity or psychological magnitude. Our 
method thus makes it possible to compare 
O's jnd estimates and partition estimates 


with ratio estimates without measuring, 
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—— LOUDNESS (CHURCHERS AVERAGE) 


O COCHLEAR POTENTIAL 
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Fra. 3. The loudness function and the size of 
the cochlear potential. The solid curve represents 
the loudness function arrive at by Churcher from 
tone-fractionation experiments. The circles repre- 
sent the averaged results of measurements of the 
size of the cochlear potentials at the round win- 
dows of six guinea pigs as a function of the inten- 
sity of the stimulus, (Taken from Stevens & Davis, 
1936, p. 5.) 


scaling, representing, mentioning, or pre- 
supposing a psychological magnitude called 
psychological length. This is not to suggest 
that it is the only method with this feature. 
Avoiding the concept of psychological mag- 
nitude seems to depend only on the in- 
genuity of the theorist or experimenter in 
representing the data of psychological 
experiments. 

Third argument. This argument contends 
that without the concept of psychological 
magnitude we would not be able to com- 
pare psychophysical functions with physio- 
logical functions. Figure 3, taken from 
Stevens and Davis (1936, p. 5) shows that 
when cochlear potential (multiplied by a 
constant) and psychological loudness are 
plotted against sound-wave intensity, the 
curves thus obtained show a high degree of 
correspondence. (That the loudness func- 
tion in this figure is no longer accepted is 
irrelevant to our argument.) That is to say, 
where Y = f;(4:) is the psychophysical 
function and @ = f2(®) the physiological 
function (@ designates the physiological 
magnitude), fi is linearly related to fz. Con- 
sequently, the authors suggest that “as a 
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first approximation, the form of the loud- 
ness function is imposed by the behavior of 
the cochlear mechanism;" but they point 
out that “identification of the loudness 
function with the recorded potential must 
[in view of divergences] be made with reser- 
vations” (p. 6). Now it may be said that 
such interesting suggestions could not be 
made without employing the concept of 
psychological magnitude. 

Thus stated, this argument begs the 
question. Let Y and 0 be a psychological and 
a physiological magnitude, respectively. It 
is true that we cannot compare a psycho- 
physical function, V = fı(®), with a physio- 
logicophysical function, @ = f2(), without 
employing the concept of psychological mag- 
nitude. But this observation is trivially 
true, since, by hypothesis, one of the func- 
tions to be compared contains V as a vari- 
able. If we are to avoid begging the question, 
we must ask whether the results of psycho- 
physical experiments can be compared with 
the results of physiologicophysical experi- 
ments without employing the concept of 
psychological magnitude, whether such com- 
parisons can be made without comparing 
Y- functions with 6-@ functions. The 
answer is, apparently, that it is possible. 

Let 0, be the physiological process (coch- 
lear potential, area of retinal stimulation, 
etc.) associated with the physical comparison 
stimulus, ®, (sound-wave intensity, rod 
length, ete.). And let 0, be the physiological 
process associated with the standard stimu- 
lus $,. In addition to fractionation func- 
tions of the form, 4, = f;(®,), we can also 
formulate and verify associated functions of 
the form 0, = f4(0,). If fa = fa, or if f; and 
fa are linearly related, then we may wish to 
conclude that 6s are the physiological 
processes underlying or determining ratio 
estimates of ds. The comparison and dis- 
covery of functions of this type can be 
accomplished without measuring, scaling, 
representing, mentioning, or presupposing 
psychological magnitudes. (The measure- 
ment of physical and physiological magni- 
tudes is, of course, required; but there is no 
problem here.) Since 6-@ and 6-6 functions 
appear to be as useful as W- and 6-6 func- 
tions in comparing the results of psycho- 


physieal and physiological experiments, and 
since the former functions do not employ 
the concept of psychological magnitude, 
why should we retain the concept? No claim 
is made here that the comparison method 
suggested is the only possible one which 
avoids the concept of psychological magni- 
tude. The point is rather that avoiding the 
concept seems to depend solely on the in- 
genuity of the experimenter or theorist in 
representing the results of relevant experi- 
ments. 

An additional point of considerable force 
against each of the foregoing arguments for 
psychological magnitude is that principle of 
correspondence (Cl) is, in an important 
sense, arbitrary. Ekman & Sjóberg (1965, p. 
452) may have some point like this in mind 
when they say: “According to a strictly 
behaviorist view of perception ...the [psy- 
chological] scale is an arbitrary and possibly 
trivial transformation of response data...” 
Instead of (Cl), why not assume that when 
O estimates that 4; and d, stand in the 
ratio 1:2, Yı and Y» stand in some other 
ratio, say 4:7? If we make this assumption, 
then (5) ean no longer be derived from (4), 
so that (3) and (4) can no longer be regarded 
as instances of the single general Psycho- 
physieal Law (5). Furthermore, if we change 
the psychological unit (to 1 mak — 4 cm.), 
a V- curve can be derived by employing 
the new principle which is in closer corre- 
spondence with the partition eurve in Figure 
2. Then it becomes easier to argue that ratio 
and partition scales are equally "valid" 
seales of psychological length. Finally, the 
Y- curve derived from the new principle 
may be in better, or worse, correspondence 
with any 6-6 curve which has been dis- 
covered. If agreement is worse then it may 
no longer be possible to argue that the 0- 
eurve describes the physiological mechanism 
underlying O's ratio estimates. The solid 
line in Figure 3 was derived by making use 
of an assumption like (Cl) for loudness- 
frequency, and also the assumption that a 
tone introduced into one ear sounds half as 
loud as the same tone introduced into both 
ears. If these assumptions are replaced by 
others then it is no longer possible to argue 
that “the form of the loudness function is 
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imposed by the behavior of the cochlear 
mechanism.” 

By assuming still other principles of 
correspondence, Experiment M ean be made 
to verify a psychophysical log law instead 
of a psychophysical power law. Each of the 
alternative principles of correspondence so 
far considered assume that equal physical 
ratios are accompanied by equal psycho- 
logical ratios. That is to say, they are all 
instances of principle (C): If O estimates 
that $,/4, = B/P; then Wj/V, = W/V. 
But why should we make this assumption? 
Why not assume insiead that equal physical 
ratios are accompanied by equal psycho- 
logical differences? This would be to assume 
that: If O estimates that $;/d» = &,/; then 
V, — VW, = P — Y. On this assumption, 
Experiment M. will establish that the psy- 
chophysical law for length is an instance of 
(2), which is of course Fechner's law. This 
point is made by Treisman (1964a, pp. 
12-16; 1964b, pp. 387-388). The only weak- 
ness in his argument is his assumption that 
Y is some neural effect in O, which leaves 
him open to the criticism made by Stevens 
(1964, pp. 383-384), who objects that 
ratio-scaling experiments need not posit 
intervening neural variables. Treisman’s 
argument can be restated without the fatal 
assumption, along the lines suggested above. 

The point above applies to experiments 
employing the technique of numerical esti- 
mation as well as to those employing the 
technique of fractionation. Many theorists 
have distinguished between “direct” and 
"indirect?" psychological measurement. Ek- 
man (1961, pp. 35, 43) says of Stevens' 
technique: “[I shall call it] direct scaling of 
subjective variables, because the essential 
steps of the scaling procedure are implied 
in the experimental situation.” Thus when 
O assigns 100 to the loudness of a standard 
tone and 62 to that of a comparison, “the 
two scale values are, by definition, on a 
ratio scale: the ratio 62/100 should, accord- 
ing to the instructions, be equal to the sub- 
jective ratio of the second loudness to the 
loudness of the standard.” Ekman says of 
Thurstone’s methods, and would say the 
Same of Fechner’s, that they are “indirect 
methods, since they are based on a set of 


assumptions intervening between the ex- 
perimental data and the final scale." Em- 
ploying this distinction, one might wish to 
argue that fractionation methods are indir- 
ect, since they involve intervening assump- 
tions; whereas numerical-estimation 
methods involve no such assumptions and 
are therefore direct. If this were true, then 
different psychophysical laws could not be 
derived from numerical-estimation experi- 
ments, since there would be no intervening 
assumptions, no principles of correspond- 
ence, to manipulate. But it is not true, for 
the method of numerical estimation assumes 
principle of correspondence (DI): If O as- 
signs to stimuli d, , d; , 4; , etc. the numerals 
m, n, o, etc. respectively, then Yı, V5, V;, 
ete. stand in the ratios m:n:o: etc. This is 
tacitly admitted by Ekman when he says 
that scale values in a numerical estimation 
experiment are “by definition, on a ratio 
scale." The “definition” in question is just 
the principle of correspondence stated above. 
Now other definitions, other principles of 
correspondence, can as easily be adopted. 
For instance: If O assigns to stimuli d; , &, , 
$, , etc. the numerals m, n, o, ete., then the 
difference between Yı and Y, is m — n, the 
difference between Y, and Y; is n — o, ete. 
Again, this assumption leads to Fechner's 
logarithmic law rather than Stevens’ power 
law. The method of numerical estimation is 
not, in Ekman's sense, direct. (Indeed, no 
psychophysical method invented is direct in 
his sense; and probably none could be in- 
vented.) Hence, data obtained by employing 
the method are as subject to arbitrary 
manipulation as those obtained on any other 
method. 

Where the concept of psychological length. 
is employed, comparisons between different, 
ratio estimates, between different psycho- 
physical functions and between psychophysi- 
eal and physiologicophysical functions de- 
pend entirely on the principles of correspond- 
ence employed. But how do we know which 
principles to use? The choice seems arbi- 
trary, and, consequently, so do the results 
of the various comparisons. This arbitrari- 
ness is not present where we employ methods 
of comparison which do not employ the 
concept of psychological magnitude. There- 
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fore, it seems not merely possible, but also 
advisable to discard the concept. 


THE IMPORTANCE OF THE Two 
INTERPRETATIONS 


Failure to Distinguish the Interpretations 


The present writer is not aware of any 
place in the literature where the introspec- 
tionist and behaviorist interpretations of 
ratio scales are explicitly and systematically 
distinguished, Nor do psychophysicists in 
general appear to observe the distinction in 
presenting and discussing the results of their 
experiments. There are two major symptoms 
of this deficiency: unclarity as to what the 
Os in psychophysical experiments estimate, 
and unclarity as to what the Es in psycho- 
physical experiments measure. It is especi- 
ally disconcerting, and especially important, 
to discover such unclarities in the writing 
of S. S. Stevens, the major architect and 
methodologist of the so-called new" psycho- 
physies. If Stevens is unclear, then so is & 
vast part of contemporary psychophysics, 
His writings will, therefore, be given most 
attention. 

First symptom. On the behaviorist, inter- 
pretation O in an experiment like M. is said 
to provide quantitative estimates of 4s, 
that is, physical entities or stimuli, like rods, 
weights and so on. The O is not said to esti- 
mate or even be aware of any Vs or Y magni- 
tudes. On the introspectionist interpretation, 
O is thought to make direct quantitative 
estimates of psychological entities, such as 
sensations—or at least estimates of psycho- 
logical magnitudes—in order to make in- 
direct estimates of physical or stimulus 
magnitudes. Stevens constantly shifts from 
one of these positions to the other. He men- 
tions all the following as estimated by O: 

(a) "the standard stimulus and a set of 
variable stimuli" (Stevens, 1956, p. 25), 
(b) "the stimuli that arouse two sensory 
magnitudes" (Stevens, 1956, p. 1), (c) 
“lan attribute of] the stimuli (Stevens & 
Harris, 1962, p. 489), (d) “the apparent 
magnitude [of the stimulus] as he perceives 
it’ (Stevens, 1960c, p. 54), (e) “the apparent 
magnitude [of the stimuli]” (Stevens, 1958b, 
p. 193), ( f) “the apparent strength [of the 
stimuli]” (Stevens, 1960b, p. 239), (g) “the 


apparent strength or intensity of his sub- 
jective impressions" (Stevens, 1960b, p. 
232), (h) "some aspect of his experience" 
(Stevens, 1956, p. 18), (v) “subjective 
events" or "sensations" (Stevens, 1957, p. 
163), (7) "sensations" (Stevens, 1954, p. 30; 
1956, pp. 24-25), (k) “the magnitude of a 
given sensation" (Stevens, Mack, & Stevens, 
1960, p. 64), (I) “the relative magnitudes of 
...Sensation" (Stevens, 1954, p. 30), (m) 
“attributes of sensation" (Stevens, 1936, 
p. 406), (n) "the apparent intensity of 
sensations aroused" (Stevens, Mack, & 
Stevens, 1960, p. 60), (o) “the apparent 
strengths of the sensations produced" 
(Stevens, 1960b, p. 238). 

In (a) through (c), O is said to estimate 
stimuli or stimulus magnitudes, so a be- 


haviorist interpretation is implied. In (i) . 


through (o), O is said to estimate psycho- 
logical entities or magnitudes, so an intro- 
spectionist interpretation is implied. (d) 
through (A) are sufficiently ambiguous to be 
consistent with either interpretation. Notice 
that conflicting interpretations are some- 
times implied in the same article, sometimes 
on the same page! Notice also that the 
vacillation between interpretations continues 
for a period of three decades to the present. 

Stevens might wish to reply that the 
vacillation is only apparent, the result of 
using convenient locutions. For in one place 
(Stevens, 1951, p. 40) he says in describing 
fractionation experiments: 


...& pair of stimuli are given, and the subject 
estimates the numerieal value of their apparent 
ratio. (More properly stated, he estimates the 
numerical ratio between the two magnitudes of an 
attribute of the sensation aroused by the two 
stimuli, but for the sake of brevity we say simply 
that he estimates the apparent ratio of the stim- 
uli.) 


But this explanation commits Stevens to an 
introspectionist view of ratio-scaling meth- 
ods. And it is difficult to believe that he 
wishes to be so committed. 

Galanter (1962, p. 142) provides us with 
a clear and instructive example of the same 
vacillation within a single paragraph. He is 
discussing the category scaling of subjective 
length. 


The failure of the subject to recognize the re- 
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peated presentations of the stimuli is not relevant 
in this category scaling experiment; rather it is his 
judgements about the relative magnitudes of the 
stimuli that are sought. The experimenter can 
never decide whether the subject is right or wrong 
in a scaling experiment; there is therefore no nat- 
ural way to introduce an outcome structure into 
this kind of experiment. In sealing experiments we 
are forced to assume the uncontrolled ability of the 
subject to report his sensations. As we shall see, the 
reproducibility of the data upon repetition of this 
experiment lends some support to this assumption. 
[italics added.] 


The occurrence of the italicized phrases 
shows that Galanter describes the scaling 
experiment in conflicting ways: first as one 
in which O estimates stimuli, and then’as 
one in which O estimates sensations. If E 
cannot decide whether O's estimates are 
right or wrong, then these estimates cannot 
be of stimuli, since E can decide whether 
estimates of stimuli are right or wrong. So 
we seem forced to conclude that O's esti- 
mates are of his sensations, since the 
accuracy of sensation estimates cannot be 
decided by E. But this is to adopt an intro- 
spectionist interpretation, which Galanter 
would presumably disavow. What he should 
say is that O estimates stimuli and that the 
accuracy of these estimates can be decided 
by E, but that E does not attempt to elicit 
accurate estimates from O. The remark that 
E must assume the accuracy of O's estimates 
is a sure sign that either the writer is an 
introspectionist or is confused. 

Warren & Warren (1963, pp. 804-805) 
seem to have noticed the vacillation illus- 
trated above. 


The New Psychophysics also makes a funda- 
mental distinction between quantitative judg- 
ments of subjective (psychological) magni- 
tudes and estimates of physical magnitudes (Ste- 
vens, 1958). Thus, in order to construct the veg 
scale of subjective heaviness, Ss’ judgments are 
considered distinct from estimates of physical 
weight. However, in the experiment of Harper and 
Stevens (1948), Ss were instructed to select that 
weight “which feels half as heavy as the standard.” 
It is difficult to see how this phrasing differs from 
instructions to choose an object which seems to be 
half physical weight. Yet this distinction is essen- 
tial for Stevens’ psychological continua. 


These writers are correct in pointing out that 
Stevens and his co-workers often instruct 
their observers to estimate physical magni- 
tude, and at other times speak as if judg- 


ments of psychological magnitude are being 
required. But they are wrong in their impli- 
cation that the new psychophysicists possess 
an official distinction between judgments of 
psychological magnitudes and estimates of 
physical magnitude, and that it is the former 
which theoretically are required of observers 
in psychophysical experiments. The dis- 
tinction seems never to have occurred to 
most of the psychophysicists in question, 
and they are far from clear as to what they 
are or should be requiring from their ob- 
servers. 

Second symptom. Failure to observe the 
distinction between the introspectionist and 
behaviorist interpretations also shows up in 
unclarity as to what experiments like M 
attempt to measure. On the introspectionist 
interpretation this sort of measurement 
consists in assigning numerals to a psycho- 
logical magnitude privately observed and 
quantitatively estimated by O. On a be- 
haviorist interpretation it consists in quan- 
titatively defining a psychological magnitude 
in terms of O's quantitative observations of 
a class of physical stimuli. Again we shall 
refer to Stevens to illustrate the vacillation 
between and unclarity regarding these two 
positions. In a 1936 article (Stevens, 1936, 
pp. 406-408) we find him saying: 


... scale numbers [should bear] a reasonable 
relationship to the experience of the observer. 
Thus, the scale would be satisfactory if the magni- 
tude of the attribute of sensation to which the 
number 10 is assigned should appear half as great 
to the experiencing individual as that to which the 
number 20 is given, and twice as great as the 
magnitude to which the number 5 is given. ... 
the subjective judgements (responses) of the ob- 
observer must provide the ultimate test of the 
validity of the numbers on the scale as representa- 
tive of degrees of loudness [in general, sensation]. 
The utilization of the observer's discriminations 
in this way presupposes, of course, that he is 
capable of making valid judgements of the numeri- 
eal ratio of one impression to another. 


These passages contain one of the strong- 
est suggestions of an introspectionist view 
of ratio scales known to the present writer. 
Stevens says that the scale is satisfactory if 
“sensations appear...to the experiencing 
individual" to have those ratios which are 
indieated by the numerals assigned to them 
by the experimenter; and that the ultimate 
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test for this result is the “subjective judge- 
ments... of the observer.” To speak in this 
way is to imply that the entities being 
measured are private sensations, inner 
events of which the observer—but not the 
experimenter—is aware through a faculty 
of inner perception, through introspection. 
In ‘addition, the last sentence quoted con- 
tains one of the most explicit uses of the 
assumption that O’s sensation estimates are 
accurate, an assumption which is the second 
premise in the introspectionist argument, 
(Ca)-(Cb)-(CI), discussed earlier. 

' Tt is surprising to hear these suggestions 
from a writer who is known for his attempt 
to apply the philosophy of operationism to 
psychology (Stevens, 1939). It is even more 
remarkable to find these suggestions in the 
‘very same pages where Stevens says: “in 
case of sensations what we want is a scale 
for the measurement of some aspect of the 
response of a living organism to a certain 
class of stimuli” (p. 406), “a subjective 
scale is a scale of response” (p. 407), and 
“loudness is a name which we give to a 
certain class of discriminatory responses” 
(p. 408). These statements suggest a be- 
haviorist program of defining private sensa- 
tions, a program laid out in greater detail 
in the previous year. 


al Since sensation cannot refer to any private or 
inner aspect of consciousness which does not show 
itself in an overt manner, it must exhibit itself 
to an experimenter as a differential reaction on 
the ‘part of an organism. ... Thus, the sensation 
red is a term used to denote an ‘objective’ process 
or eyent which is public and which is observable 
by any competent investigator. ...In the same 
way that sensation denotes a class of reactions 
which satisfy certain criteria, attribute of sensa- 
tion denotes a sub-class which satisfies more re- 
stricted criteria [Stevens, 1935, p. 524]. 


These passages are puzzling. Stevens says 
that an E may concern himself with a 
“private aspect of consciousness" so long as 
it “exhibits itself...as a differential reac- 
tion.” Isn’t this to cling to the private and 
unmeasureable entity while trying to rid 
oneself of it? It is one thing to say that 
sensations which are defined as differential 
reactions can be measured, quite another 
to say that private sensations which are 
exhibited as differential reactions can be 
measured. To say the latter suggests that it 


is the private sensation we really ought to 
measure—but that since we cannot we can 
only do second best and measure a public 
substitute, that is, the differential reaction, 
Now the pure behaviorist does not cling to 
private sensations, nor try to measure them 
in terms of substitutes. He holds that if 
there are any private entities, they are 
absolutely of no interest or importance to 
mathematical science. His position is that 
certain theoretical constructs may be defined 
in terms of quantitative observer estimates, 
which constructs may then be said to be 
“measured”; and that we may call these 
constructs “sensations” if we choose. But 
they are not to be thought of as substitutes 
or representatives for some sort of private 
entities. These constructs must stand on 
their own feet, in virtue of their scientific 
usefulness, and not in virtue of going proxy 
for some private entities which the scientist 
unfortunately cannot observe. 

The passages analyzed above are taken 
from writings early in Stevens’ career. It is 
not unreasonable to expect that in the inter- 
vening 30 years he would have clarified his 
position on the interpretation of ratio scaling. 
And we do find some evidence of this. In 
Stevens (1958a, p. 386) he says that since the 
‘non-operational aspects of sensation" are 
"inaccessible," the “operational stance is 
indispensable to scientific sense and mean- 
ing” in psychology. It follows, he thinks, 
that “verifiable statements about sensation 
become statements about responses.” In 
1959 we can read (Stevens, 1959b, pp. 12, 15) 
that “immediate experience” with its privacy 
is not the object of the science of sensation. 
Sensation, we are told, is, like temperature, 
“a construct, a conception built upon the 
objective operations of stimulation and 
reaction. We study the responses of or- 
ganisms, not some nonphysical stuff that by 
definition defies objective test.” In 1960 
Stevens (1960b, p. 226) cheerfully concedes 
that we cannot measure the “strength of a 
sensation” in the “inner, private, subjec- 
tive” sense; but he insists that there is 
another sense of “sensation strength” which 
makes it possible for us to ask “sensible 
objective questions about the input-output 
relations of sensory transducers... " 

These passages clearly suggest a behavior- 
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ist interpretation of psychophysical measure- 
ment. But it is one thing to suggest 
a position, another to describe it in detail— 
to lay it out for critical serutiny and examine 
its implications. It is, therefore, quite im- 
possible to say what positions Stevens 
would take on many of the points discussed 
in this paper. Furthermore, we can still find 
significant evidence of an introspectionist 
view of the object of psychophysical meas- 
urement in Stevens’ later writings. Some of 
these passages have been provided in earlier 
sections. One more is worth mentioning 
(Stevens, 1956, pp. 24-25): 


Sensations do not come with numbers written 
on them, and when we try to assess the ratio 
between a pair of them we find ourselves up against 
a difficult task of appraisal. It is no wonder then 
that subtle constraints and biases can influence 
the result. This is another way of saying that the 
outcome is a function of the method—as it always 
is in science. What we want, of course, is an un- 
biased method, one that on the average lets 0 
make an estimate that is neither too high nor too 
low. Since we do not know in advance what his 
estimate should be, we can apply no independent 
eriterion of validity. 


This passage says that in a ratio-scaling 
experiment O estimates sensations and Æ 
attempts to assign the correct numerals to 
these sensations. Stevens implies that, both 
tasks are difficult. This can only be because 
the sensations have not been previously 
quantified, which makes it difficult for 0; 
and because the sensations are private to O, 
which makes it difficult for E. Stevens’ 
lamenting the lack of an “independent 
criterion of validity” can only be understood 
by assuming that he regards the scale whose 
validity is in question as a scale of a private 
magnitude! 

Hirsh (1952, pp. 4-6) provides us with 
another example of confusion about the 
object of psychophysical measurement. He 
begins by noting the difficulty of measuring 
private entities. 


The end product of our several sensory systems 
is the sensation—auditory, visual, tactual, ol- 
factory, or gustatory. Each of us knows what a 
sensation is, because each of us has sensations. 
- It is difficult, however, for each of us to know 
about another person’s sensations, because we 
cannot get inside his world of experience very 
easily. You and I may both say ‘red’ when we see 
a particular object; but you cannot be sure that 


my impression of red is exactly the same as yours. 
... We cannot observe the sensations of others, 
and we can only measure what we can observe. 


He then suggests a way out of the difficulty. 


Since we cannot observe the sensation that 
exists in another individual’s world of experience, 
it would seem indeed that we cannot measure 
sensation. On the other hand, we can twist the 
meaning slightly and define the sensation in terms 
of events that we can measure, When a man says, 
“I see red,” we cannot measure the redness of his 
visual sensation, nor even be sure that he has one, 
but we can observe his verbal behavior—*'I see 
red." The phenomena of audition may be studied 
in the same way. We cannot measure auditory 
sensations that are private, but we can measure 
sensations that are defined in terms of behavior 
or observable responses. 


"These passages clearly contain a behavior- 
ist-like interpretation of psychophysical 
measurement. But, in the first place, they 
are extremely sketchy and vague. How, 
precisely, are sensations to be defined? In 
terms of responses to physical stimuli, or in 
terms of responses to private sensations? 
If the latter, then the suggestion is still in- 
trospectionistic. And how does Hirsh under- 
stand "definition"? Does he regard the 
statements which define sensation as stipula- 
tive definitions whose only defense is their 
usefulness? If not, then his suggestion may 
still contain introspectionist elements. 

In the second place, Hirsh seems, even 
more clearly than Stevens, to cling to the 
private entity even while trying to expel it. 
He laments our inability to “get inside" 
another individual's world of experience, as 
if that is what the psychologist really wishes 
to do. He says that the end products of our 
sensory systems are private sensations, that 
we cannot measure these, and that we must 
settle for measuring something closely re- 
lated, something defined in terms of behavior 
or observable responses. The behaviorist 
does not attempt to measure private entities 
in terms of public substitutes. His view is 
that the end products of our sensory systems 
should be regarded either as physiological 
processes or as behavioral responses, not as 
private sensations. The measurement, of 
physiological processes is accomplished, not 
by procedures like those in Experiment M, 
but by tapping the organism with instru- 
ments, like oscilloscopes. As for behavioral 
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responses, they may be measured in either 
of two senses. On the one hand, certain 
attributes of responses—duration, loudness, 
etc.—can be measured in the ordinary 
physical sense by using clocks, meters, etc. 
On the other hand, certain theoretical con- 
Structs can be defined in terms of quantita- 
tive behavioral responses, and the entities 
thus defined ean then be said to be meas- 
ured. But these entities should not be 
thought of as substitutes for the private 
entities of which Hirsh speaks. 


The Importance of Distinguishing the In- 
terpretations 


The previous section was designed to 
indicate that the distinction between the 
introspectionist and behaviorist interpreta- 
tions of ratio scales has more than academic 
interest. In spite of its importance, there is 
almost universal failure even to suggest the 
distinction, much less explicitly to draw it. 
And there is a universal tendency, found 
even among those who mention the distinc- 
tion, to run the two interpretations together. 
The explanation for this confusion may be as 
follows. The most natural and attractive 
way of viewing ratio scaling in particular 
and psychophysical measurement in general 
is to see it as the attempt to quantify pri- 
vately observable, empirically real magni- 
tudes. But the consequences of this view are 
that psychological magnitudes cannot be 
measured nor psychophysical laws verified 
in an acceptable scientific manner. The 
philosophically sensitive psychophysicist rec- 
ognizes these consequences, and seeks to 
avoid them by adopting a behaviorist inter- 
pretation of psychophysical measurement. 
He concedes that although the private 
psychological magnitudes are incapable of 
scientific treatment, psychological magni- 
tudes as defined in terms of observer re- 
sponses can be measured and laws regarding 
them verified. 

However, it is difficult to discard the 
introspectionist interpretation completely, 
and for more than one reason. In the first 
place, this interpretation is the more natural 
and attractive of the two, both because it has 
a longer history and also because it is, for 
some deeper reason, intellectually the most 
satisfying. Secondly, the psychophysical 


experimenter tends to be impatient with : 
philosophical and methodological considera- 5 
tions, and to be unwilling to lay the be- ` 
haviorist definitions out in detail and to 
examine their implications with care, But 
more fundamental than either of these 
reasons is the logical similarity between the 
theoretical fictions of the behaviorist inter- 
pretation and the empirically real, ob; 
servable psychological magnitudes of th 
introspectionist. j 
Although psychological length as defined 
in (i) and (ii) is merely a shadow of its 
introspectively observable counterpart, it is - 
still a magnitude: one can still speak of Ws _ 
as being greater than, equal to, and less than 4 
one another. And although the principle of 
correspondence is, on the behaviorist view, 
merely a stipulative definition, still it re- 
ceives the same formulation as the intro- 
spectionist principle. These similarities pro- - 
duce a strong tendency to slip from the 
behaviorist interpretation back into the - 
introspectionist, or, what is virtually the 
same error, to treat psychological length as 
defined in (i) and (ii) as a substitute, or - 
proxy, for the privately observable, scien- 
tifically unmanageable psychological mag- 
nitude posited by the introspectionist. This f 
tendency manifests itself in a number of 
ways, one of the most important of which is 
to think of psychological length as a single ` 
magnitude which observers may estimate 
in a number of different ways. Thus O's - 
one-half and one-fourth estimates in Experi- 
ment M are thought of as different fractional — 
estimates of entities which lie along a single | ; 
continuum. And jnd, interval, and ratio — 
scales are thought of as different scales of a 
single magnitude, so that if neither scale - 
agrees with any of the others then only one | 
of them can be “valid.” b 
Failure to distinguish the introspectionist — 
from the behaviorist interpretation produces 
vacillation between the two, and serves to j 
conceal the difficulties in both. Psychological. 
length is on the introspectionist interpreta- 
tion an “empirically real" but scientifically 
unmanageable magnitude; on the behavior- 
ist a scientifically manageable but “fic- . 
tional” magnitude. Running the two inter- 
pretations together creates the delusion that — 
psychological length is both “empirically | 
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real" and scientifically manageable. On the 
introspectionist interpretation Psychophysi- 
cal Law (5) is a description of O's perceptual 
mechanism; but it is unverifiable since only O 
can observe Ws. On the behaviorist interpre- 
tation (5) is verifiable in the standard scien- 
tific manner, but it is not an explanation of 
how O perceives physical length. By amalga- 
mating the two interpretations, (5) seems to 
emerge both as publicly verifiable and as an 
explanation of O's perceptual behavior. But 
the amalgam is, of course, unstable, since it 
incorporates contradictory elements. 

Only when we carefully distinguish the 
two interpretations, laying each out in detail 
as we have done in previous sections, does 
it become possible to assess ratio-scaling 
procedures. It seems clear that the mak 
scale is illegitimate if interpreted intro- 
spectionistically. The O observes no private 
psychological entities; and even if he did, 
the mak scale would provide us, not with 
measurements, but only with estimates of 
these entities. Furthermore, Psychophysical 
Law (5) is unverifiable on this interpretation: 
both because it rests on principle of corre- 
spondence (Cl), which in turn rests on un- 
verifiable assumptions (Ca) and (Cb); and 
also because O's (putative) quantitative 
estimates of psychological length are uncon- 
firmable. (We must say “unverified assump- 
tions” and “unconfirmed estimates” where O 
is thought to estimate a physiological 
magnitude.) 

If we adopt a behaviorist interpretation, 
these positive objections no longer apply. 
Rather the objection becomes the negative 
one that the scaling of psychological length 
is unnecessary, because the concept of 
psychological length is unnecessary. In a 
previous section it was tentatively argued 
that any legitimate purpose in constructing 
a ratio scale of psychological length can be 
equally well achieved without employing 
the concept of psychological magnitude. If 
this is so, then the concept is dispensable 
and may be abandoned. And there are 
reasons for thinking that it should be aban- 
doned. One of these is simply Occam’s 
maxim, which says: Do not multiply entities 
beyond necessity. Put in more contemporary 
fashion: Do not clutter up the theoretical 
system with unnecessary constructs. If this 


reason seems insufficient, we may point out 
the confusion caused by the constant temp- 
tation, described earlier, for the behaviorist 
to slip back into an unacceptable, introspec- 
tionist way of thinking about psychological 
magnitudes. Other reasons emerge by con- 
sidering alternatives to those systems which 
include the concept of psychological magni- 
tude. 


Beyond the Two Interpretations 


It may seem that to abandon the concept 
of perceptual magnitude is to abandon 
perceptual psychophysics. This is not so. It 
is rather to adopt a view of the nature of 
perceptual psychophysics which differs both 
from that of the “old” psychophysics of 
Fechner and the “new” psychophysics of 
Stevens. Whatever their differences, both 
theorists view perceptual psychophysics as 
the attempt to measure perceptual magni- 
tudes in order to discover mathematical 
laws relating these to stimulus magnitudes. 
So it is clearly unorthodox to recommend 
doing away with the concept of perceptual 
magnitude. But it is one thing to embrace 
an unorthodox view of the nature of per- 
ceptual psychophysics, and another to 
abandon the science. The most promising 
unorthodox view would replace the concept 
of a perceptual magnitude with that of a 
perceptual, or discriminatory, ability. Per- 
ceptual psychophysics then becomes the 
experimental discipline which describes, 
measures, predicts, and perhaps even ex- 
plains the perceptual abilities of organisms. 
Let us examine the consequences of applying 
this view of the science to Experiment M. 

M is really an attempt to determine O's 
ability to make fractional estimates of rod 
length by sight. The O's ability to do this 
may be perfect, or it may be less than perfect 
in varying degrees. These degrees of ability 
can be represented in psychophysical func- 
tions of the form, $, = b ®, . If O were able 
to quarter lengths perfectly the value of 
b would be .25. If he were able to halve 
lengths perfectly the value would be .50. 
To the extent that b falls below this value, 
O underestimates the comparison rod (over- 
estimates the standard). To the extent that 
b exceeds this value, O overestimates 
the comparison rod (underestimates the 
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standard). Thus the value of b provides a 
measure of O's ability to make fractional 
estimates of length. This measure can be 
obtained merely by constructing the ®-6 
functions of Figure 1. The Y- functions of 
Figure 2 are not only unnecessary; it is 
difficult to see immediately what connection 
they have with perceptual ability. 

The discovery of 6-@ functions makes it 
possible for us to compare O's abilities to 
make fractional estimates of one-fourth 
with his ability to make fractional estimates 
of one-half, one-third, two-thirds, ete. O 
overestimates the comparison rod in the first 
part of Experiment M by .34-.25/.25, or 
36%. He overestimates the comparison rod 
in the second part by .58-.50/.50, or 16%. 
This makes it possible for us to say not only 
that O’s ability to fourth lengths is not as 
great as his ability to halve them, but also 
how much greater the one is than the other. 
What this difference in abilities shows is not 
our concern here. But one might wish to 
consider the hypothesis that other things 
than length are among O's cues, or that he 
does not comprehend fractions perfectly, or 
even that different physiological mechanisms 
are involved in the two fractionations. As we 
saw in an earlier section, the $-& functions 
also make it, possible for us to compare O's 
ability to make ratio estimates with his 
ability to make interval estimates of length. 
Again, as we saw previously, by comparing 
psychophysical 6-6 functions with their 
associated physiological 6-8 functions, we can 
make inferences regarding the physiological 
mechanisms underlying O's ability to make 
ratio estimates. 

The point is not that perceptual capacities 
can be measured and studied only through 
the discovery of psychophysical -æ func- 
tions. Functions which relate number of 
psychological intervals to stimulus magni- 
tude (like the jnd function in Figure 2) are 
also useful. And still other sorts of function, 
as well as a variety of statistical methods, 
ean be employed by E. We merely wish to 
suggest that in perceptual psychophysics 
every legitimate question can be raised and 
answered, and every legitimate comparison 
and inference can be made, by employing 
the concept of a perceptual ability. If this is 
80, then perceptual psychophysies ean be 


just as rich, interesting, and productive when 
construed as the science of perceptual 
abilities as it is when construed, in the man- 
ner of Fechner and Stevens, as the science 
of perceptual magnitudes. Furthermore, our 
unorthodox view has the positive advantage 
of laying to rest those nagging philosophical 
doubts about the possibility of psychological 
measurement which have attended psycho- 
physies since Fechner founded the science. 

There is no philosophical problem of 
psychological measurement for the science 
of perceptual abilities. On the orthodox 
view, psychological measurement is regarded 
as the measurement of psychological magni- 
tudes. But there is a question as to whether 
such measurement is possible. If psychologi- 
cal magnitudes are private, as the intro- 
spectionist believes, then they are, it would 
appear, incapable of measurement. If we 
define psychological magnitudes in terms of 
observer responses, as the behaviorist does, 
then they appear capable of measurement. 
But this solution seems to allow the prize 
to slip through our fingers. What we really 
wished to measure was an empirically real, 
although private, dimension of mind; but 
all we managed to measure was an anemic 
substitute, a theoretical construct or fiction. 
We really wanted to discover the relation 
between a psychological magnitude corre- 
lated with but defined independently of a 
physical magnitude; but instead we were 
forced to define the psychological magnitude 
in terms of (observer responses to) physical 
magnitude. 

These difficulties are completely circum- 
vented by construing psychological measure- 
ment as the measurement of perceptual 
abilities. For when we abolish psychological 
magnitudes, such as psychological length, 
psychological weight, etc., then there can be 
no question of how we measure them. Viewed 
as an attempt to determine ability to frac- 
tionate length, Experiment M consists of 
(a) measuring the lengths of the rods to be 
presented, (b) eliciting quantitative esti- 
mates of rod length from O, and (c) com- 
puting the - functions in Figure 1. No- 
where in these three stages do we find any 
attempt to measure psychological magni- 
tudes. Speaking loosely, we may say that 
(a), (b), and (c) constitute a complex proce- 
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dure for measuring O's ability to fractionate 
physieal length. But strietly speaking, the 
only measurement which occurs in M is the 
measurement of physical length in stage (a). 
And since there is no problem in measuring 
physical length, there is no problem of 
measurement when M is regarded as an 
attempt to determine one of O's perceptual 
abilities. 

In the “new” psychophysics the old 
philosophieal doubts about the possibility 
of psychological measurement reappear with 
a somewhat different emphasis, that is, as 
doubts about the possibility of determining 
which of the various types of psychological 
scale is "valid." In the attempt to scale 
perceptual magnitudes such as psychological 
length, a problem arises over the validity of 
competing scales. Shall we employ a jnd, an 
interval, or a ratio scale? On the intro- 
spectionist interpretation this problem is 
unavoidable and insoluble. Each scale is con- 
strueted from what are taken to be O's esti- 
mates of private psychological entities. 
Since there is no way of confirming such 
estimates, there is no way to determine 
whether the scales built from them. accu- 
rately represent the magnitude of the private 
entities. On the behaviorist interpretation, 
there still seems to be a problem. Psychologi- 
cal magnitudes are defined in terms of O's 
publicly observable responses to physical 
stimuli, so that the privacy problem is 
solved. But do O's jnd responses, his interval 
responses, and his ratio responses to length 
define three different psychological magni- 
tudes, three different psychological lengths? 


It seems odd to say that they do, since they 
are all responses to the same physical mag- 
nitude. On the other hand, if a single psycho- 
logical magnitude is involved, then scales 
constructed from the different responses 
compete with one another and we must 
choose between them. And how shall we do 
that? 

Where we take ourselves to be scaling 
perceptual abilities, these difficulties do not 
arise. There is no philosophical problem 
concerning the validity of psychological 
scales for the science of perceptual abilities. 
It is obvious that a scale constructed from 
the partition estimates of an observer can at 
best provide a measure only of his ability to 
make partition estimates: it cannot provide 
a measure of his ability to make ratio esti- 
mates. To deny this would be like asserting 
that a scale of a person's muscular ability to 
lift rods can provide a measure of his ability 
to discriminate rods, or that a scale of his 
reading ability can provide a measure of his 
mathematical ability. Of course it may be 
necessary to choose between different meas- 
ures of perceptual ability. For example, after 
discovering the &-& functions of Figure 1, we 
must decide whether O's ability to frac- 
tionate lengths is best represented by the 
difference between the obtained value of 
b and the value for a perfect estimate 
(.25 for quartering, .50 for halving) or by 
the ratio between these values. But this 
decision can be made on the basis of scien- 
tific convenience and usefulness. It involves 
no philosophical problems. 
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OPERATIONS OR WORDS? 
S. S. STEVENS 
Laboratory of Psychophysics, Harvard. University 


A comment on Psychological Monograph 627, Part I, by C. Wade Savage, 
Introspectionist and Behaviorist Interpretations of Ratio Scales of Perceptual 


Magnitudes. 


V V HENEVER his choice of words causes 

misunderstanding, the operational 
scientist ought perhaps to invert the well- 
known paternal dietum and advise his read- 
ers to follow what he does, not what he says. 
A chief leverage of the operational stance 
resides in its ability to sustain the concepts 
of science, regardless of the vagaries of 
changing styles in verbal behavior. Under 
the drive to keep abreast of the evolving, 
turbulent jargon of science, we alter our 
habits of speech with less and less concern 
for tradition, but the useful meanings of 
words continue to rest with the operations 
to which the words refer. If that point of 
view, so basie to science and so natural to 
most scientists, had been adopted by C. 
Wade Savage (1966), perhaps his schol- 
arly review of the ratio scaling of percep- 
tual magnitudes would have assumed a 
more mellow cast. A good operationist, I 
think, would have shown little shock and 
alarm concerning my many verbal vacilla- 
tions as I have sought over the years to 
convey to different audiences a feeling for 
the goal of psychophysics. Yes, if Savage 
would attend to what I do and not to what 
I say, there is a good possibility that he 
might come to share an enthusiasm for the 
psychophysical power law and the several 
scientific questions it has illuminated. It 
is what we do in the experiments that mat- 
ters most, not what words we use to de- 
scribe our results. For words are made to 
serve us, not to rule over us. 

Savage has produced an erudite and fo- 
rensie endeavor that deserves to be scruti- 
nized with care, as he himself has seruti- 
nized his many references. His alarm at my 
verbal transgressions is only incidental. The 
broader purpose is to examine the implica- 
tions of two opposing points of view, called 
introspectionist and behaviorist. They are 
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idealized points of view, not necessarily 
representative of any psychophysicist, liv- 
ing or dead. The two views are verbal crea- 
tions, each posing as an active agent that 
is able to make interpretive statements 
about psychophysical experiments. Thus 
the introspectionist says that psychologi- 
cal magnitudes are observable but private; 
whereas the behaviorist says that they are 
public but nonobservable. (I myself would 
say that they are public and observable, 
which would serve to justify one of Sav- 
age’s complaints, that I refuse to fit neatly 
into his categories.) 

The fabricating of a straw-filled protag- 
onist—or even a pair of them—can provide 
good entertainment as well as an occasion 
for the clarification of meanings at the syn- 
tactical level. Since there exist few natural 
constraints on the stuffing of verbal straw, 
the introspectionist and the behaviorist can 
be pictured as extreme types, exaggerated 
and caricatured, true to no actual man. 
They may then become personae through 
which the author speaks, and since they 
can be cast in antithetical roles, their verbal 
jousting can have the form if not the sub- 
stance of empirical debate. There need be 
no harm in such logomachy, unless perhaps 
the word game happens to deceive its own 
creator. That, in fact, is what may have oc- 
curred. 

Having sketched the models of the two 
“interpretations,” introspectionist and be- 
haviorist, Savage turns his attention to the 
phrasings of various psychophysicists and 
finds them wanting. He who writes about a 
psychophysical experiment may be hanged, 
we are led to believe, merely for the words 
he uses. If he says the listener judged the 
apparent magnitude of sensation, he is 
guilty of the introspectionist interpretation. 
Tf he says the listener compared a variable 
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stimulus to a standard stimulus, he com- 
mits himself to the behaviorist view. Since 
I myself, in striving to effect a reasonable 
degree of communication with other people, 
have used many different phrasings, I am 
said to writhe in conflict and to vacillate 
in indecision. I beg to plead guilty on both 
counts, but not for the reasons given; my 
conflicts and vacillations usually concern 
matters of greater substance than mere 
words. But that is just the point, what con- 
sequence hinges on the words we use? A 
rose by any other name. ... 

The prime reason for using care in the 
choice of words is to smooth the flow of 
communication, We want if possible to use 
the word that correctly taps the reader’s 
semantic set and speeds him to a grasp of 
the operations referred to. Replying to 
my Aunt Emma’s query about what I 
do at Harvard besides teach, I tell her 
that I measure people’s sensations—for in- 
stance, what they hear and how loud it is. 
She once answered, “Yes, I had my hear- 
ing measured recently.” I let it go at that. 
A description in terms of the measuring of 
sensations, particularly the loudness of 
noises, is also a phrasing that may speed 
communication with an acoustical engineer. 
With some of my philosopher friends, how- 
ever, I prefer to say that I try to determine 
the relative responses of sensory systems 
by means of cross-modality matching. If 
I use the words sensation or subjective, 
some philosophers leap to the verdict that 
my task of measurement is impossible on 
the face of it. As these examples suggest, 
it is the reader's semantic set that deter- 
mines how a phrase will sit, and the writer 
is usually powerless to mold his reader's 
vocabulary into some preconceived ideal. 
So the writer casts about for the apt de- 
scription, and on different occasions he is 
likely to say it in different ways. 

I am impressed by the care with which 
Savage has tabulated examples of my be- 
havioristie warp and my introspectionistic 
woof. My variations in phrasing are found 
to be symptoms of “unclarity.” My the- 
saurus seems to have failed me. I am said 
to vacillate back and forth. Actually, it 
Seems to me that the tabulation of my 


phrasings shows evidence of a drift, a kind 
of upward trend toward freer expression. 
It had not been apparent to me before, 
but the passing of three decades has in- 
deed made a difference. When behavior- 
ism was new and still self-conscious about 
its words, there were taboos and prohibi- 
tions, as though we could lay ghosts merely. 
by rubbing out their names. But the devel- 
opment of operationism, aided by the se- 
mantic analyses contributed by some of the 
logical positivists, has led to the view that 
verbal taboo, like word magic of all sorts, 
is unworthy of the scientist. It is not what 
we say, it is what we mean that counts, 
and to get at those meanings we must 
go behind the words to the operations indi- 
cated. The scientist in his contest with na- 
ture depends not on words but on opera- 
tions—methods, procedures, inventions, 
devices, manipulations, the whole parapher- 
nalia of science. Now that I have come to 
heed the counsel of the operational view, 
whenever I measure subjective sensation 
I am apt to say so in just those words. 
What do I mean? Let us consider briefly 
a pardigm of the operations that are in- 
tended, a paradigm discussed at greater 
length elsewhere (Stevens, 1966a). 


Tue MATCHING OPERATION 


In the study and measurement of per- 
ceptual magnitudes we utilize acts of judg- 
ment as one of the basie operations. But 
what is judgment other than a process of 
comparing, equating, or matching? Con- 
sider an example. In measuring luminance, 
the photometrist changes the brightness 
of the comparison field until he judges it to 
be equal to the brightness of the target 
field. He behaves as a comparator. If you 
alter the target field, he will adjust the 
comparison field accordingly. His judgment 
is clearly a matching operation. If we ex- 
tend this matching paradigm to other 
kinds of judgment, we find that the core of 
all judging operations is a matching—a 
coupling or conjoining of an element from 
one domain with an element from another 
domain. It is hard to envisage an act of 
judgment that cannot be so construed. 

Next let us consider the particular type 
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of judgment that is basic to the measure- 
ment of sensation. A scale of loudness can 
be constructed by asking observers to pro- 
duce a match between loudness and per- 
ceptual values on one or more other con- 
tinua. Many such cross-modality matches 
have been made, and the results have been 
mapped into a self-consistent family of 
functions. 

Examples of such functions are shown in 
Figure 1, where the results obtained by 
matching loudness to 10 other continua are 
displayed in log-log coordinates (Stevens, 
1966b). Perhaps the most provocative fea- 
ture of the matching functions is their 
transitivity: any two matching functions 
can be used to predict a third. Thus the 
slopes of two of the lines, for example, those 
relating loudness to handgrip and loudness 
to brightness, allow us to predict the slope 
of the matching function when observers 
make a direct match between handgrip and 
brightness. 

Any one of the functions in Figure 1 
could be used to define the loudness scale, 
the so-called sone scale, for the choice of a 
reference function, like the choice of a unit, 
is basically arbitrary. Contrary to what 
is sometimes assumed, the matching of 
loudness to number enjoys no privileged 
status in the family of matching func- 
tions. It is merely another cross-modality 
function. If you take that function away, 
where are we left? Essentially right where 
we were. We can then define the sone 
scale in terms of matches between loudness 
and some other continuum, apparent 
length, say. All the engineering applica- 
tions for which the sone scale has proved 
useful would still be open to us. 

I have discussed the creation of so-called 
subjective scales in terms of the matching 
operation because it seems to present a 
better paradigm than the example offered 
by Savage, which was the scaling of ap- 
parent length by means of the now obsoles- 
cent procedure of fractionation. Fractiona- 
tion also involves a matching process, as 
I have noted elsewhere (Stevens, 19662), 
but, since the matching involves prescribed 
ratios, fractionation exhibits complexities 
not shared by direct cross-modality mateh- 


ing. As a consequence, discussions of frac- 
tionation have often swerved toward tan- 
gential issues, issues that obscure the prime 
concern. The focal issue centers on the use 
of the observer as a comparator. We meas- 
ure a process in the observer, such as per- 
ceived magnitude, by systematic study of 
his behavior as a matching comparator— 
his behavior as a balancer, equater, or 
conjoiner. 

Perceived magnitude, sensory intensity, 
or whatever you care to call it, becomes, 
then, a construct. But that is not surprising, 
because all useful empirical concepts are 
constructs. Concepts vary, of course, in the 
specificity of their operational basis, and 
the direction of progress for many scien- 
tific constructs is toward increasingly pre- 
cise and determinate definition. The 
concept of perceptual magnitude rested ini- 
tially on the operations of verbal response 
(matching with adjectives), exemplified by 
such phrases as strong taste, faint smell, 
blinding light, soft sound, and so forth. 
With the invention of devices to control 
stimuli, it became possible to turn the 
level of a sound up and down and to note 


Sound pressure level in decibels 


Relative intensity of criterion stimulus 


Fic. 1. Examples of equal-sensation functions 
obtained by matches between loudness and per- 
ceptual values on various other continua. The 
straight lines through the data define power 
functions in the log-log coordinates. The slope 
of the line gives the exponent of the power func- 
tion. In most of the 10 experiments, groups of 10 
or more observers adjusted the level of a sound 
to match criterion stimuli presented in irregular 
order in another sense modality. For handgrip, 
vocal effort, and length, fixed sound levels were 
set, and the observer varied the other stimuli 
to match the loudness. For the function labeled 
number, the listeners match loudness to a series 
of numbers spoken by the experimenter. 
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the apparent change in loudness. It was 
promptly observed that a logarithmic in- 
crease in sound level did not seem to pro- 
duce a linear increase in loudness, as many 
had thought it should. Thereupon some of 
the pioneering work on loudness scaling 
was undertaken to determine the relation 
between loudness and sound pressure level. 
The functions in Figure 1 represent an 
example of a later stage in the continuing 
effort to specify with ever increasing cer- 
tainty the function that relates subjective 
loudness to stimulus intensity. 


PERCEPTUAL MAGNITUDE 


Savage's dismay at the failure of some 
of us to use verbal forms that would con- 
sign us consistently to one or the other of 
his categories of "interpretation" seems 
to originate from a belief that words mean 
only what he happens to think they mean, 
not what some of the rest of us think they 
mean. That is a very human tendency, of 
course, but one that may lead to cross- 
purposes when a man's choice of a word 
is mistaken for his ^way of thinking." Thus 
when I speak of psychological or subjective 
magnitude, I do not refer to the scientifi- 
cally unmanageable private entities so dear 
to the introspectionist, as posited by Sav- 
age. I refer, rather, to a configuration of 
defining operations, a configuration that 
is growing more and more definitive as 
functions like those in Figure 1 become 
better determined. 

Tt is interesting to note that the physi- 
cist’s definition of the physieal intensity 
of a sound has also changed rapidly in 
recent years. Half a century ago, it was not 
possible to measure such parameters of an 
acoustic wave as the sound pressure. The 
operations had not been invented. With the 
development of the microphone and the 
vacuum-tube amplifier it became for the 
first time possible to record the minute 
pressure changes produced by ordinary 
sound waves. As knowledge in this area 
has increased, the operational definitions 
have become more precise, but none of the 
definitions are as yet petrified in rigidity, 
hor are they beyond further improvement. 

Savage proposes that the concept of per- 
ceptual magnitude is dispensable. I feel 


sure that the words themselves are dispens- 
able, but is that what he means? Or does 
he mean that all matching functions, such 
as those in Figure 1, must be filed in the 
wastebasket? If the concept is to be aban- 
doned, what do we do with the configura- 
tions of matching operations on which the 
concept rests? 

Savage complains that my writing lacks 
clarity. I do indeed fall short in my aspira- 
tion to compose impeccable prose, but I 
also have another frailty: I sometimes 
fail to decipher precisely what another 
writer has in mind. Savage says that we 
should replace the concept of perceptual 
magnitude with another concept, namely, 
perceptual ability. My failure here is to 
understand whether the proposal is for a 
verbal substitution or whether it augurs 
a substantive alteration in my laboratory 
experiments. As a corollary, the question 
presents itself whether the matching func- 
tions in Figure 1 are to be construed as 
measures of perceptual ability. If so, then 
we are engaged in a verbal substitution, 
and the question arises whether the term 
ability is a wise choice to convey our mean- 
ing. One is reminded of the debate that 
has swirled about the ‘concept of mental 
ability. Is psychophysics about to plunge 
into a similarly sticky morass? Does abil- 
ity mean a potential for performance, or 
merely what a person happened to do on a 
given occasion? Maybe what he did was an 
accident and unrepeatable. What then? 
Contemplation of such questions may well 
suggest that perceptual magnitude is quite 
as amenable to operational definition as is 
perceptual ability. 


Bits anp Pieces 


There are several minor queries that sug- 
gest themselves in Savage's essay. I shall 
try to note only a few of them. 

Does the new psychophysies try to 
distinguish between estimating  subjec- 
tive magnitude and estimating physical 
magnitude? Yes, it does. From an experi- 
enced photographer you can get two esti- 
mates of the light level, depending on 
whether you ask him to judge the physical 
or the apparent level. And for the apparent 
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or subjective value’ he will give widely dif- 
ferent estimates if you vary the state of 
adaptation of his eyes. Similarly, a sound 
engineer can estimate the decibel level of a 
noisy factory, or, under a different Auf- 
gabe, he can judge apparent loudness. In 
an experiment on the judgment of visual 
areas, Martha Teghtsoonian (1965) found 
that magnitude estimations of apparent 
size gave a power function with an expo- 
nent of 0.76. When asked. to judge the 
physical area, the observers changed their 
basis of judgment, as evidenced by the 
exponent’s increasing to 1.03. 

In view of the foregoing facts, I do not 
understand why Savage seems to say that 
"the new psychophysicists [do not] possess 
an official distinction between judgments of 
psychological magnitude, and estimates of 
physical magnitude.” There is nothing par- 
ticularly “official” about it, but some ex- 
perimenters are keenly aware of the dis- 
tinction and try to be careful to state it— 
whenever it matters. When it does not mat- 
ter, as it often’ does: not, then'àn observer 
may be instructed merely to match stim- 
ulus z to stimulus y, or some such task. 
If Savage would try running a number of 
these psychophysical experiments, he would 
probably discover the virtue of framing 
his instruetions to the observers with an 
eye to ease of comprehension. Thus with 
an attribute like length, it matters little 
whether a person is told to judge the sub- 
jective length, the length, or simply the 
lines. Nor is it necessary to tell him to ig- 
nore the considerable manifold of other 
attributes, such as apparent thickness, 
color, tilt, duration of presentation, etc., 
any one of which could be judged if the 
experimenter happened so to instruct the 
observer. On the other hand, a careful 
wording of the instructions is probably 
needed when the purpose is to elicit match- 
ing judgments for auditory density or audi- 
tory volume (Stevens, Guirao, & Slawson, 
1965). 

My ability to communicate with Savage 
is perhaps nowhere more disappointing than 
on the topic of measurement. Somehow I 
sense in his discussion of measurement the 
atmosphere of a debating contest in which 
the opposition is considered wrong on all 


issues, regardless. The measure of satisfac- 
tion that I had sometimes felt at having 
introduced the clarifying concept of invar- 
iance into the theory of scales is somewhat 
shattered by the manner in which Savage 
gives that concept the back of his hand. I 
still hope it deserves better, and that the 
theory of scale types will continue to be 
characterized in terms of the group of 
transformations that leaves a scale formi 
invariant. t rii 

One final word about words. The use, of 
stimulus-response language does not make 
a man a behaviorist, nor does the use of 
terms like subjective, or even private ex- 


perienee, make a man an introspectionist. 


It is hardly that simple. You cannot tell 
them solely by their spots, at least not by 
the spots they make on paper. A scientist 
who tries to adhere to the operational view 
will insist on looking behind the words to 
discover the occasions and circumstances 
of their use. The term subjective may then 
turn \out,to mean merely that a living. sub- 
ject participated’ in the experiment—-ani- 
mate rather than inanimate. In a similar 
vein, private experience may mean only 
that the reaction in question occurred in 
one particular person. In a similar sense, 
each of the ammeters on the bench before 
me enjoys its own private experience when 
its working element (a coil suspended in 
a magnetic field) performs the compara- 
tor task of indicating the point at which a 
torque produced by magnetic forces bal- 
ances a torque produced by elastic forces. 
Different ammeters behave in different 
ways, and I can study their matching be- 
havior much as I study the matching be- 
havior of people, namely, by mapping the 
input-output matching functions, Each 
person’s matching judgment in a psycho- 
physical experiment is a private affair, but, 
in a comparable sense, the comparator 
process in a given ammeter is the private 
affair of that instrument. 

In conclusion, I should like to press a 
precept that seems acutely relevant to the 
study of perception. When we study the 
input-output characteristics of ammeters, 
we do not feel called upon to imagine how 
it feels to be an ammeter, nor do we try 
to relate our own experiences to those of 
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ammeters. In the scientific study of man, 
especially in the study of the operating 
characteristics of his sensory systems, 
many pseudo problems can be bypassed if 
we take the same objective attitude toward 
the human participant in an experiment as 
we take toward an ammeter. We may then 
carry out psychophysical experiments 
with human subjects much as we perform 
them with animal subjects, claiming no 
privileged view of things merely because 
we, as experimenters, happen to be human. 


(If some animal should decide to experi- 
ment on human beings, it ought, no doubt, 
to try to follow a similar precept.) In our 
actual, concrete experiments we do, of 
course, proceed as though the human sub- 
ject is an object of study. It is usually 
when we set. about to describe what we 
have done that troubles arise, for the 
layers of meanings that attach to the 
words we use may then mislead some of 
our readers, even as I seem to have mis- 
led Savage. 
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Comparisons between the designs of within- and between-S partial rein- 
forcement experiments were made as a prelude to 4 experiments in which 
rats were given acquisition and extinction training in black and white 
straight runways. For within-S training 1 runway was associated with 
partial reinforcement (PRF) and the other with continuous reinforcement 
(CRF). Between-S groups were also trained in the two runways, the CRF 
group receiving continuous reward and the PRF group partial reward in 
both. Only within-S groups were trained in the first two experiments. In 
Experiment 1 rewards, and in Experiment 2 trials, in the PRF and CRF 
runways were equated. The acquisition data of both experiments were 
similar to previous data showing faster terminal PRF speed in the early 
segments of the response chain and slower terminal PRF speed late in the 
chain although there were indications of an interaction between the reward 
schedule and the color of the runway. In both experiments PRF and CRF 
responding extinguished alike. Within- and between-S groups were com- 
pared in the last two experiments. In Experiment 3 the within-S acquisition 
data were somewhat like those from the first 2 experiments. The between-S 
PRF group, however, did not show higher speeds than the CRF group early 
in the response chain, as is often reported, but were slower in all segments 
of the chain. 44 of the Ss were extinguished in 1 runway and 14 in the other 
making all extinction comparisons between Ss. Aside from strong ef- 
fects of runway color the within-S groups showed no difference in extine- 
tion of PRF and CRF responding and extinguished more like the between-S 
PRF group than like the CRF group. Experiment 4 was similar to Experi- 
ment 3 except that extinction was in both runways. Acquisition and extinc- 
tion findings were similar in their import to those obtained in Experiment 3. 
A concluding discussion points up differences between acquisition and ex- 
tinction data from the within- and between-S situations and attempts to 
reconcile these differences in terms of frustrative nonreward mechanisms. 
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Fe several years we have been studying 
a class of phenomena related to succes- 
sive discrimination. This is the kind of 
discrimination in which (as in Pavlovian 
differentiation) one of two discriminanda 
is presented on any one trial and discrimina- 
tion is evidenced by changes in the intensity 
of performance to the positive and negative 
stimuli. The particular aspect of such experi- 


1 The research described in this report was sup- 
ported by grants from the National Science Foun- 
dation (GB-3772) and from the National Research 
Council of Canada (APT 72). We are greatly 
indebted to John C. Ogilvie for his advice and 
assistance in the statistical analysis of the data of 
these experiments, 

* Now at Connecticut College. 


ments which has interested us most is the 
one in which the subject (S) is exposed to a 
variety of prediscrimination experiences in’ 
relation to discriminanda prior to the actual 
differentiation training, and the rate of 
discrimination learning (or resistance to 
discrimination) is related to these experi- 
ences. The theoretical relevance of this type 
of experiment and some experiments utiliz- 
ing different prediscrimination procedures 
have been described in recent reports (Am- 
sel, 1962; Amsel & Ward, 1965). 

In one experiment, prediscrimination 
training prior to a black-white discrimina- 
tion consisted of reinforcing the response to 
black on an intermittent (50%) basis, while 
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the same response to white was being rein- 
forced on every occasion. At first, we were 
interested in what effect this prediscrimina- 
tion condition might have on subsequent 


discrimination learning, and this feature of - 


the experiment is reported in detail in a 
recent monograph (Amsel & Ward, 1965). 
However, the focus of our interest changed 
from the discrimination phase of the experi- 
ment to the prediserimination phase itself, 
and this is the point of departure for the 
present monograph. It became apparent 
while running the prediscrimination phase 
that we were involved in the study of partial 
reinforcement (PRF) effects within Ss 
(Amsel, MacKinnon, Rashotte, & Surridge, 
1964). We observed that in the prediscrim- 
ination phase the B+ stimulus was operating 
within a given S as a PRF condition, while 
the W+ stimulus was operating like a con- 
tinuously reinforced condition, in the sense 
that several of the phenomena of partial- 
reinforcement acquisition that had previ- 
ously been reported in between-S types of 
experiments (e.g., Goodrich, 1959) were also 
discernible within an S although the within-S 
and between-S phenomena were by no means 
identical. 

While the phenomena uncovered by Amsel 
et al. seemed clear enough, the experiment 
itself was incomplete and deficient in several 
respects, the main value of the study being 
that it seemed to suggest an important 
approach to the study of mediational factors 
operating in partial reinforcement. It also 
pointed up the faet that between-S and 
within-S PRF conditions represented poten- 
tially important differences in psychological 
processes—differences it might pay us to 
understand better. Our present task is, then, 
to conduct PRF experiments that will (a) 
eliminate the deficiencies of our first within- 
S experiment, and (b) allow us to make 
meaningful comparisons between the within- 
S and between-S cases. We will introduce 
the two facets of our purpose in these experi- 
ments in reverse order. 


Tue STUDY or PARTIAL 
REINFORCEMENT 


The phenomena which fall under the 
heading of partial reinforcement effects have 


been among the most examined and most 
interpreted in psychology for the last 20 
years. While this is not the place to review in 
detail the great variety of findings and 
interpretations that might be included in 
this area, an outline of such a review might 
nevertheless be in order to delimit that 
portion of the larger study of partial rein- 
forcement which will concern us here. 

A response is said to be partially or inter- 
mittently reinforced if it is rewarded accord- 
ing to some probability less than unity, and 
according to any of a variety of patterns. 
While early references to reinforcement on a 
schedule of less than 100% can be found in 
Pavlov (1927) and Skinner (1938), the 
major early systematic attack on the prob- 
lem was by L. G. Humphreys who, from 
1939 to 1943, published a variety of experi- 
ments in which he compared partial with 
continuous reinforcement. He studied a 
diversity of responses including eyelid con- 
ditioning (Humphreys, 1939a), verbal ex- 
pectancy or guessing (1939b), galvanic skin 
response (GSR) conditioning (1940), and the 
acquisition of bar pressing in a Skinner box 
(1943), and came to the conclusion, as did 
an early review of the literature by Jenkins 
and Stanley (1950), that partial as compared 
with continuous reinforcement resulted in 
greater resistance to extinction, and that 
partial reinforcement acquisition was either 
only slightly inferior to or equal to acquisi- 
tion under continuous reinforcement, condi- 
tions. Since that time, several major lines 
of research on partial or intermittent rein- 
forcement have been developing, and it is 
at least approximately the case that each 
of the various lines of investigation pursued 
by Humphreys has become a specialized 
area in its own right. For example, the study 
of verbal expectancy or guessing has been 
done almost exclusively by those interested 
in the development of mathematical models; 
while investigators with seemingly quite 
different interests have pursued the discrete- 
trial instrumental learning and the free- 
operant investigations of partial reinforce- 
ment phenomena, using mainly nonhuman 
Ss. 

In this report, we will be studying discrete- 
trial instrumental learning with moderate 
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spacing of trials. Therefore, our discussions 
will not, necessarily, have relevance for the 
free-operant case, nor even for the discrete- 
trial case with extremely short intertrial 
intervals. Our explanations involve classical 
conditioning as an explanatory model for 
PRF effects in instrumental behavior, but 
we do not investigate classical conditioning 
directly. An explanation which involves 
hypothetical, classically conditioned internal 
responses as mediators of overt instrumental 
behavior cannot, at the same time, account 
for partial reinforcement effects in classical 
conditioning, and we will take the position, 
as Spence (1960) has, that different explana- 
tory schemes will probably be required to 
understand PRF effects in classical condi- 
tioning and instrumental learning. 

It will also be clear that our phenomena 
are not coextensive with those studied by 
Skinner and his associates under the heading 
of Schedules of Reinforcement (e.g., Ferster & 
Skinner, 1957). The Skinnerian situation, 
involving free responding, is different from 
the discrete-trial runway situation in a 
variety of ways; but perhaps the most 
important difference is in the separation of 
trials, that is to say, in the integrity of the 
individual-trial experience. When a Skin- 
nerian speaks of fixed-interval and variable- 
interval schedules, or of fixed-ratio and 
variable-ratio schedules, or of any of the 
number of other compound schedules built 
of these four basic elements, he is involved 
very importantly in the concept of response 
chaining: the feedback stimulation arising 
from the nth response may be part of the 
stimulus pattern affecting the n + 1th 
response; the reinforcement of the nth re- 
sponse may also affect then — 1th response, 
and even other responses more remote in 
the chain when the organism effects a burst 
of responses for a terminal reinforcement. 
In our work, we are at some pains, as will 
be evident from some of our design con- 
siderations, to achieve discreteness of trials. 
While we may not always be entirely success- 
ful in this, the type of thinking we pursue 
is most effective in those situations in which 
the carried-over traces or aftereffects from 
one trial are not a part of the stimulus 
complex for the next trial. 


Having restricted our area of interest to 
the discrete-trial, instrumental learning 
situations, we can now pursue the distinction 
between within-S and between-S PRF situa- 
tions venturing some opinions on the psycho- 
logical significance such a distinction may 
have. In between-S designs, one group of 
Ss performs under conditions of partial 
reinforcement and the other under con- 
tinuous reinforcement, and the effects of 
partial reward on the form of acquisition 
and extinction curves are examined in terms 
of between-group differences. In the within-S 
experiment, the same S is partially rewarded 
for a response in the presence of one stimulus 
(Sı) and continuously rewarded for the same 
response in the presence of another stimulus 
(S2). The comparison in such experiments is 
between the performance of a group of Ss to 
one stimulus as compared with the per- 
formance of the same group of Ss to another 
stimulus. (Of course, this is identical in 
principle to the analysis of data from dis- 
erimination learning with separate presenta- 
tion of stimuli.) 

To reduce the comparison of the be- 
tween-S and within-S cases to its most 
simple and ideal level would require a be- 
tween-S experiment involving two Ss and 
a within-S experiment involving one. (This, 
of course, is approximated in some of the 
Skinnerian reports. In the two-S experi- 
ment, each of the two (presumably otherwise 
identical) Ss would be exposed to a different 
reinforcement condition, one continuous 
the other partial. The one-S experiment 
would involve the development within the 
S of two separate systems or processes, one 
related to Sı and the other related to $. 
It is difficult to maintain that one form of 
the experiment has any advantage over the 
other; they are simply different. The psycho- 
logical justification for the between-S ex- 
periment would be that it represents an 
experimental model for how separate but 
similar organisms are affected by different 
patterns of reinforcement in relation to the 
same response. The within-S experiment, 
on the other hand, is a model for the develop- 
ment within the same organism of different 
systems or processes relative to different 
environmental events with associated rein- 
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forcement contingencies. An example of such 
between-S and within-S differences that 
immediately comes to mind is from that 
complex of stimulus-response-reinforcement 
relationships that exists in “the family.” 
Forgetting for the moment about complex 
interactions between hereditary and environ- 
mental factors determining personality, and 
assuming that the family we are considering 
is composed of mother, father, and two 
children (why not identical twins?), all of 
the elements for our comparison are present. 
Let father be Stimulus; and mother Stimu- 
lus; (no priority intended); and let the 
children be Subject; and Subjects. Each 
child is then an S in both a within-S and a 
between-S experiment to the extent that 
(a) each, separately, may be on a different 
discrete-trial schedule of reinforcement in 
relation to the two parents for the same 
behavior, and (b) the two children may be 
on different schedules of reinforcement in 
relation to each parent for the same kind 
of behavior. The complexity of the relation- 
ships that are possible, even in this next-to- 
simplest of family situations, will be appar- 
ent, and the kinds of question that this 
situation raises will be obvious. Can the 
same child learn different patterns of vigor- 
persistence relationships to the two parents 
as stimuli and reinforcing agents? Assuming 
that this is so, still, to what extent will the 
child who has learned persistence in relation 
to the inconsistent reinforcing tactics of one 
parent manifest persistence also in relation 
to the other parent who has been more 
consistent? These are within-S questions. 
The between-S questions might apply to 
the relationships between one parent and 
both children, or between the much more 
complex relationships that hold between 
both parents and both children. Between-S 
experiments, particularly those run in the 
laboratory and particularly, again, those 
using animals as Ss, are asking very simple 
questions compared to those that might be 
asked about the family relationship. The 
basie question of course is, to what extent 
can differences in reinforcement produce 
two different organisms, one relatively more 
persistent—or more vigorous, or both—than 
the other? 


Our introduction to between- versus 


within-S differences has been in terms of a 
specific example. We will develop some of 
these ideas in a more general and abstract 
way later. But for the moment we turn to 
the second purpose of our experiments which 
is to follow up our first within-S experiment 
and try to eliminate some of its deficiencies, 


DEFICIENCIES IN THE AMSEL, MacKinnon, 
RasHOTTE, AND SURRIDGE 
EXPERIMENT 


The experiment of Amsel et al. (1964) 
was incomplete in at least four important 
respects. 

1. It was run under only one condition 
of color, that in which the black stimulus 
signaled partial and the white stimulus 
signaled continuous reinforcement. The clear 
possibility remained that some interactions 
of color and reinforcement pattern might be 
operating and that reversing the color condi- 
tions in relation to reinforcement might 
produce a different result. 

2. The experiment was run under condi- 
tions which equated the two stimuli for 
numbers of reinforcements rather than 
trials. In the usual between-S PRF experi- 
ment, two groups of Ss are run, one under 
continuous the other under partial reinforce- 
ment. Ordinarily, both groups experience 
the same number of trials during the experi- 
ment but are different with respect to 
numbers of reinforcements, the continuous 
group usually receiving twice as many 4s 
the partial group. Our experiment was run 
three trials a day, with two trials to the black 
stimulus, partially reinforced, and one 
trial to the white stimulus, continuously 
reinforced. There remained a question 
whether the phenomena demonstrated under 
the B+W-+ conditions (reinforcements 
equated) could be replicated under B3: W+ 
conditions (trials equated). 

3. Our experiment contained no direct 
comparison of the within-S result with an 
appropriate between-S condition. We simply 
compared our result to the various be- 
tween-S experiments that had already been 
published, having great confidence in the 
between-S results which had been reported 
in many separate and independent investiga- 
tions. i 

4. Our report contained no extinction 
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data, since the experiment itself was run 
as the initial phase of an experiment in which 
the second phase was diserimination learn- 
ing. The Amsel and Ward monograph to 
which we have referred did suggest that 
following many trials of B+W- predis- 
erimination experience "diseriminative ex- 
tinction” was slower in a B—W-+ discrimi- 
nation than in a B+-W— discrimination, 
providing what appeared to be evidence for 
differential extinctive effects attributable 
to within-S PRF acquisition; however, we 
had no data to tell us how the same S would 
extinguish to both black and white stimuli 
after this sort of prediscrimination training, 
or how a group of Ss would extinguish to 
only black as compared to another group 
to only white when both groups had had 
the B2- W+ preextinetion experience. 


THE EXPERIMENTS 


The present monograph describes four 
experiments which (a) extend the Amsel, 
MacKinnon, Rashotte, and Surridge experi- 
ment and cover the deficiencies which have 
been described; (b) examine in a detailed 
fashion the kinds of acquisition and extinc- 
tion phenomena that emerge out of the 
within-S PRF experiment when it is run 
under a variety of conditions; and (c) make 
Such comparisons as are possible between. 
the within-S and between-S experiments. 

Much of what we tried to do in these 
experiments can be understood from a 
schema shown in Figure 1. This figure says, 
essentially, that in PRF experiments acquisi- 
tion can be either between Ss or within S, 
that extinction following within-S acquisi- 
tion can also be either between Ss or within 
S, and that extinction following between-S 
acquisition must necessarily be between Ss, 
since there is then no basis for within-S 
extinction. In between-S acquisition, as 
shown in the top left of the schema, two 
Ss or two groups of Ss are exposed to the 
Same stimulus conditions, but in one group 
the stimulus always precedes partial rein- 
forcement, whereas in the other the same 
stimulus always precedes continuous rein- 
forcement. To test for the PRE in extinction, 
both groups are presented with the same 
stimulus, but in no case is either S reinforced 
for the response it makes to that stimulus. 


SCHEMA OF BETWEEN-S AND WITHN-S 
EXPERIMENTAL CONDITIONS FOR PARTIAL REWARD EXPERIMENTS 


ACQUISITION EXTINCTION 
BETWEEN S WITHIN S 
USUAL STIMjÉ (P) STIM,- 
BETWEEN S STIM;$ (C) STIM; - 
WITHIN S stim)? (P) Sta > STM, = 
(Pc) STIMat (C) STIM2~ STIM2 = 
‘stim, t (Pl STIM, ~ STIM; = 
(PP) STIMg* (P) sriis- sivo - 
BETWEEN S| 
(cc) STIM, $ tc) SUM STIM, = 
AND 
Stmat lC) STIM?” STIMa = E 


Fie. 1. The row above the double line (be- 
tween-S) represents the usual partial reinforce- 
ment experiment; the second row represents 
within-S acquisition followed by either between-S 
or within-S extinction. The bottom two rows, 
(PP) and (CC), together constitute the appro- 
priate between-S control for the within-S (PC) 
experiment. 


The basic within-S condition (PC) is shown 
in the next row of the diagram. In this case 
the same S experiences two stimuli and, 
while response to Stimulus, is partially 
reinforced, response to Stimulus, is con- 
tinuously reinforced. Following such within- 
S acquisition, extinction may be run in a 
between-S or within-S manner, In between-S 
extinction for such a condition, half of the 
Ss are extinguished only to Stimulus; while 
the other half are extinguished only to 
Stimulus: , all Ss having been exposed to 
both stimuli during acquisition. Within-S 
extinction following within-S acquisition 
allows every S to be exposed to both stimuli 
during extinction and to experience non- 
reward in relation to both, In both kinds of 
extinction following within-S acquisition we 
are testing for the effects of reinforcement 
schedules applied during acquisition. 

The third and fourth rows of this schema 
(PP) and (CC) represent our conception 
of the appropriate between-S comparison 
for the within-S experiment. These are 
within-S cases only in the sense that jS is 
exposed to two stimuli as he is in tho 
within-S PRF experiment. However, in the 
PP case S experiences PRF in relation to 
both stimuli, while in the CC case con- 
tinuous reinforcement (CRF) in relation to 
both stimuli. Consequently, PP versus 
CC is the same as the between-S PRF at 
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the top of the schema, with the one difference 
that each S in both groups is exposed to 
two stimuli (Si+S2-+ or Sı#S:+) instead 
of only one, as is the case in the ordinary 
between-S experiment. It is in this same 
limited sense that such Ss may be extin- 
guished in the between-S or within-S mode. 
All this means for the PP and CC cases is 
that an S who has had PRF in relation to 
two stimuli or CRF in relation to two stimuli 
may undergo extinction to only one of these 
stimuli or to both of them. To reiterate, 
the rows of our schema which are designated 
within-S (PC), between-S (PP), and be- 
tween-S (CC), together constitute our con- 
ception of a within-S experiment: a group 
which is reinforced on two schedules (PC) 
is run along with appropriate between-S 
“control” groups (PP and CC). 

There is reason to question whether the 
particular comparisons of between-S and 
within-S experimental conditions we under- 
took are the “correct” comparisons. How 
should the conditions of the two kinds of 
experiment be arranged in order to afford 
reasonable comparison? After examining 
this question at some length we were forced 
to conclude that any comparison of be- 
tween-S and within-S results, in the sense 
that data from a control and an experimental 
group are compared, is probably impossible, 
and that all we can expect to do is to compare 
the kinds of results that can be obtained 
under a number of versions of the two 
experimental forms. 

To begin with, it is obvious that the 
within-S experiment can be conducted in a 
manner that equates for number of rein- 
forcements or for number of trials to the 
two stimuli. Using a generalized notation, 
we would designate the first case $:+8.+, 
the second 8,82, and we have actually 
conducted both forms of the experiment. 

However, considering only the S;+Ss-+ 
case, the equivalent between-S experiment 
is difficult to arrive at. First of all, the 
within-S condition represents 75% rein- 
forcement if one ignores the two colors, and 
50% versus 100% reinforcement if one 
considers that the separate colors control 
separate systems within the organism. Simi- 
lar considerations apply to the appropriate 


numbers of trials for the between-S “con- 
trol" Certainly each S in the within-S 
condition has four trials in each S,-ESs3E 
block, but only two trials to each of the 
stimuli. The question then is whether the 
appropriate block of trials in the between-S 
comparison should have four trials or two. 
Assuming that we choose the two-trial 
alternative, is the proper between-S com- 
parison to B+W+ two groups, a partial 
B+ group and a continuous W+ group? 
Or is it the more usual PRF experimental 
condition in which color is held the same 
between the two groups, B+ versus B=? 
Or W versus W? Or both? Assume, on 
the other hand, that one selects for the 
between-S comparison experiment the four- 
trial alternative. Now every S in the experi- 
ment will experience the same number of 
trials; but for the within-S group, the be- 
tween-S partial group, and the between-S 
continuous group respectively, the overall 
percentages of reinforcement will be 75, 
50, and 100. If this is the choice, again the 
same questions as to color apply for the 
between-S conditions. Should Ss be exposed 
to one stimulus color only, the same across 
partial and continuous groups? Or should 
each S see one color but a different one in 
the PRF than in the CRF group? Or should 
each S see two colors as in the within-S 
condition, but with the same reinforcement 
schedule connected to both colors (B+W 
versus BEW)? 

We selected this last alternative as repre- 
senting the most reasonable way to compare 
between-S and within-S conditions, since 
it meant that each S in the between-S con- 
dition would have the same number of trials 
as an S in the within-S condition and would 
also experience two alley colors during the 
experiment. This produced some interesting 
results along with some difficulties. 


ExPERIMENT 1 


In the pilot experiment of Amsel et al. 
(1964), one group of Ss acquired a running 
response in a straight alley under two stimu- 
lus conditions related to two conditions of 
reinforcement. When the alley was black S 
was rewarded on half the trials, and when 
the alley was white S was always rewarded. 
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Each S ran three trials on each day, two 
of these black and one white, for 108 days. 
Measures of time to traverse the 62-in. 
runway were taken over the early middle 
and late sections of the alley: a starting 
measure, à running measure and a goal 
measure, respectively. The data show that 
in the first two sections of the alley Ss run 
faster at asymptote on partial (black) trials 
than on continuous (white) trials, but that 
in the goal region Ss run slower on partial 
than on continuous trials, suggesting that 
the usual between-group partial reinforce- 
ment acquisition effect at asymptote can be 
reproduced within Ss. 

The purpose of the first experiment was 
to replicate and extend the earlier experi- 
ment. Specifically, a group was run under 
B+W-+ conditions, as in the previous 
experiment, and another group was added 
and run with the color-reinforcement rela- 
tionship reversed (W+B-+). Also, extinction 
trials were run to determine whether the 
usual between-S partial reinforcement ex- 
tinction effect could be reproduced under 
within-S conditions. The extinction mode 
applied here was within-S, all Ss undergoing 
extinction to both stimuli. 


Method 


Subjecis. The Ss were 20 experimentally naive 
male albino rats of the Wistar strain from Woodlyn 
Farms, Guelph, Ontario. They were about 110 
days old when received in the laboratory, and 
about 130 days old at the start of the experiment. 

Apparatus. The apparatus consisted of a pair 
of runways, one white the other black, each of 
which could be aligned with a common entry 
box-start box unit, painted gray. 

, Response-time measures could be taken over 
either three 1-ft. segments (starting, running, and 
goal) or, by extending the alley, over five 1-ft. 
Segments, involving three intermediate running 
Measures. Experiments 1 and 4 employed the 
Tunways in their lengthened (five-segment) form, 
and Experiments 2 and 3 employed the shorter 
version. 

, The entire apparatus was constructed of 34-in. 
Pine, and was covered throughout with }4-in. 
Plexiglas, hinged to allow access to the runways. 
The runways were 334 in. high and 27$ in. wide. 
The walls and wooden floor sections were painted 
either flat black or flat white. The entry box (1l 
In. X 3 in. X 334 in.) and the starting section 
(10/4 in. X 2 in. X 334 in.) of the apparatus were 
Painted flat gray. The entry box was separated 
from the start chamber by a gray, metal, guil- 


lotine-type door. A clear 14-in, Plexiglas door 
separated the start chamber from the runway 
portion of the apparatus. 

In Experiment 1 the two straight runways, 
white and black, were used in their lengthened 
mode. Five performance measures were taken (by 
means of Model 120A Hunter KlocKounters 
activated by a photoelectric system) over five 
l-ft. segments. The sequence of starting and 
stopping the KlocKounters was as follows: (a) 
Raising the Plexiglas start door at the entrance 
to the runway opened a microswitch which acti- 
vated the first clock. (b) When S traversed 1 ft. 
of the runway, a photobeam was interrupted 
stopping Clock 1 and starting Clock 2. Clock 1, 
then, provided the start measure. (c) A second 
photobeam located 24 in. from the start door, when 
interrupted, stopped Clock 2, providing the Run 
I measure, and activated Clock 3, and so on, until 
the interruption of a photobeam located 3 in. 
from the end of the alley stopped Clock 5 providing 
a goal measure. A small metal food cup extending 
the width of the alley and with a lip to conceal 
the presence of a food pellet was suspended on 
the end wall of the runway, 3 in. beyond the last 
photobeam and 2}4 in. from the alley floor. 

Metal retrace doors painted either flat black 
or flat white, corresponding to the walls and floors 
of the runways, confined S to the food cup area. 
The length of the goal segment was 12 in. 

The apparatus was illuminated by two 15-watt 
bulbs suspended in milk-glass globes approxi- 
mately 96 in. above the runways. The only other 
source of light in the experimental room was one 
71$-watt frosted glass bulb suspended above the 
clocks. 

Procedure. Three weeks before the beginning 
of the experiment Ss were placed on a 24-hr. food 
deprivation schedule. During this period each S 
received a daily ration of 11 gm. of Purina lab 
chow in its home cage with ad libitum water, and 
each S was handled every 2 days for a few minutes. 
There was no habituation to or prefeeding in the 
apparatus. 

The Ss were run in squads of 10, with five Ss 
from each color group randomly assigned to each 
squad. Squads were taken to the experimental 
room in a 10-place carrying cage and waited 10 
min, before the first trial of the day. 

The experiment involved 270 acquisition trials 
and 36 extinction trials. During acquisition there 
were three trials per day, two to the partial stimu- 
lus and one to the continuous. The six possible 
trial sequences were randomly assigned within 
successive 6-day blocks for each S over the dura- 
tion of the experiment. Ss were run with a mini- 
mum intertrial interval of 15 min. 

The procedure on any individual trial was as 
follows: (a) S was removed from the carrying cage 
and placed in the entry box. (b) When S had 
oriented in the direction of the runway, the gray 
metal door was raised, allowing S to enter the 
start box. (c) Approximately 1 sec. later, the clear 
Plexiglas door leading to the runway was raised 
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and S was allowed to traverse the runway and 
enter the goal area where, on reward trials, the 
food cup was baited with one 500-mg. Noyes 
pellet. (d) The retrace door was lowered, confining 
S in the goal box. On a reward trial S remained 
in the goal box until the pellet was consumed. 
(e) S was removed from the goal box and placed 
back in the carrying cage. On nonreward trials 
the procedure was identical except that S was 
removed from the goal box after 15 sec. Following 
the last trial of a day, Ss were placed in their home 
cages, and 40 min. later received the remainder 
of their daily ration. In order to reduce the possi- 
bility of discrimination on the basis of olfactory 
cues in the goal area, ground food, not of course 
visible to S, was scattered beneath the runway. 
Also, prior to each individual trial whether re- 
warded or nonrewarded, the goal box was swept 
clear of any traces of food particles from the 
previous trial. Consequently, the pretrial mani- 
pulations of the experimenter (E) were nondiffer- 
ential over all trials, decreasing the likelihood that 
S might pick up a cue for reward or nonreward 
from E’s activity. (These procedures were followed 
in all experiments.) 

During extinction, Ss were run as in acquisition 
except that food was never available in the goal 
box and Ss were detained in the goal box for 15 
sec. on all trials. If S failed to traverse any of the 
five 1-ft. segments of the alley within 60 sec., he 
was removed from the alley and all subsequent 
segments were scored as 60 sec. for that trial. 


Results and Discussion® 


The results of the first experiment are 
shown graphically in three figures. Figure 2 
shows group curves for all alley measures 
and for acquisition and extinction in mean 
reciprocals (speeds). The acquisition data are 
plotted in 6-day blocks, the extinction data 
in 3-day blocks. Figure 3 presents individual 


3 Analyses of variance of speed data in this 
report are those appropriate to split split-plot 
designs. In the present analyses there is a be- 
tween-S factor (‘Color Groups" in within-S 
analyses; “Reinforcement” in between-S analy- 
ses); a within-S factor (“Days”) and its interaction 
with the between-S factor; and, a second within-S 
factor (‘‘Reinforcement’’ in within-S analyses; 
*Color" in between-S analyses) and its associated 
interactions. Appropriate error terms were com- 
puted to test each of these effects. A complete 
summary table of a within- and between-S analy- 
sis is presented in Tables 1 and 2 for the fourth 
experiment, which is the largest in that there are 
five alley measures and both within- and be- 
tween-S groups. This summary of the analyses 
typifies the treatment of speed data in all of the 
experiments. In the first three experiments we 
report only those factors in the analyses which 
reach acceptable confidence levels. 


S acquisition data for all measures for three 
Ss which represent types of performance 
observed in this experiment. Figure 4 shows 
extinction performance for each S in terms 
of the goal measure. 

Acquisition. The results as presented in 
Figure 2 are separated for the two color 
conditions. The acquisition data for the 
group in which black is partial (B--W-) 
replieate and extend the previously pub- 
lished data. Performance to BÆ is more 
vigorous than to W+ in all the measures 
beginning with the second 18-trial (6-day) 
block and remains so in the start and Run I 
measures. In the other running measures 
and in the goal measure this difference 
disappears, and in all measures it decreases, 
progressively and systematically working 
forward in time from the goal measure where 
a clear reversal becomes apparent. These 
data would tend to support an interpreta- 
tion of increasing aversiveness to the partial 
stimulus as the goal is approached, this 
aversiveness becoming attached to cues 
earlier and earlier in the runway after ex- 
tended training. 

The acquisition data from the W+B+ 
group are shown in the left-hand panel, 
and they do not, at first glance, seem to show 
the same terminal effects, especially in the 
goal measure. However, close examination 
will reveal that the differences in perform- 
ance to the two stimuli are much greater 
when white is the partial stimulus than when 
black is, and that the differences across à 
horizontal comparison of conditions are due 
largely to faster running to the partial 
stimulus when it is white. There seems to be 
little, if any, differenee in speed between 
W+ and B+. f 

Analyses of variance for the acquisition 
data were conducted separately for three 
successive 30-day blocks and for each of the 
five measures. The three successive blocks 
of days are equivalent to Trial Blocks 1-5, 
6-10, and 11-15 in Figure 2. Three factors 
were included in these analyses: à between-S 
factor, Color Group, and two within-S 
factors, Day and Reinforcement. The Rein- 
forcement factor in these analyses refers to 
the three types of trial on each day (Con- 
tinuous versus Partial+ versus Partial—) 
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Fra. 2. Acquisition and extinction data from five alley segments for the two within-S color groups 
of Experiment 1. Acquisition data are plotted i in 6-day blocks (mean of 12 partial trials and 6 continuous 
trials) and extinction data are plotted in 3-day blocks. 


and subsequent linear contrasts allowed a 
partial versus continuous comparison as well 
as a test of the difference between the two 
types of partial trial. The latter test was 
made to determine if Ss were discriminating 
between the positive and negative partial 
trials (presumably on the basis of firstness 
and secondness), for which there seemed to 
be some evidence earlier (Amsel et al., 1964). 

Analyses for the first 30-day block show 
that the Day, Color Group X Day and 
Reinforcement effects were significant in 
each measure (all at the .001 level), with no 
other effects reaching significance. Linear 
contrasts on the means for each type of 
reinforcement showed no significant differ- 
ence between the Partial+ and Partial— 
trials in any measure in either Color Group, 
and a reliable difference between partial 
and continuous trials only in Run II in the 


W+B+ group and in Run III in the 
B2 W-4- group (p < .05 in each case). The 
significant Day factor reflects the acquisition 
of the running response and the significant 
Color Group X Day interaction reflects 
faster running in the W+B+ group, due 
mainly to the higher speeds to W+. 
Analyses for the second 30-day block 
show a significant effect of Day in all meas- 
ures except Run I; of Reinforcement in all 
measures except goal; and of Color Group X 
Reinforcement in all measures except start 
and Run I. No other effects were significant. 
The Day effect reflects changes in perform- 
ance across days which does not seem neces- 
sarily to be related to the acquisition of the 
response. Linear contrasts show no difference 
between the two types of partial trial in 
any measure in the W4B+ group, and a 
reliable difference of this sort only in the 
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start measure in the B+W-+ group. Since 
this is the only occurrence of a difference 
between Partial+ and — ‘trials’ in the entire 
experiment (out of 30 such comparisons), 
it presumably does not reflect a systematic 
discrimination by the animals. Contrasts 
between partial and continuous trials for 
the W4B-+ group showed significant differ- 
ences in Runs I, II, and III but no signifi- 
cant difference in start or goal measures. For 
the B4+W-+ group there was a significant 
difference only in the start and Run I meas- 
ures. The Color Group X Reinforcement 
interaction reflects the Runs II, III, and 
goal differences evident in Figure 2, the bulk 
of this difference being due to differential 
performance to the partial stimulus by the 
two groups, dropping off in the B++W-- 
group and remaining at asymptote in the 
W=+B-+ group. 

The analyses for the last 30-day block 
show a significant Color Group effect in the 
Run III (p < .05) and the goal measure 
(p < .01), and this factor just fails to reach 
the .05 level of confidence in Run II. There 
is also a significant Day effect in all measures 
which reaches the .05 level in the start 
measure and the .001 level in all other meas- 
ures, and a significant Color Group X Day 
effect in Run IT and Run III (p < .01 each) 
and in goal (p < .05). These effects reflect 
an overall lower level of performance by the 
B+W-+ group in the last three alley meas- 
ures and an overall change in performance 
across days in this block of trials which takes 
the form of a decrease in running speed to 
both stimuli in the B4+W-+ group and even 
a slight increase in running speed by the 
W=+B-+ group. Reinforcement was also 
significant in all measures in this block of 
trials and Color Group X Reinforcement was 
significant in all measures except start. 
Linear contrasts revealed no difference be- 
tween the two partial trials in either group, 
but a reliable difference between partial and 
continuous trials in the W+B+ group in 
all measures except goal. There was no 
reliable difference between partial and con- 
tinuous trials in any measure in the BL{W+ 
group. The bulk of the Color Group X 
Reinforcement interaction would again seem 
to be due to differential performance to 


B+ and W, the former continuing the 
drop off begun in the previous block of 
trials. 

The statistical analyses of the acquisition 
data support the relationships apparent in 
the plot of speed data in Figure 2. 

In Figure 2, the failure of reversal in the 
goal measure when the partial stimulus is 
white could be a purely physical artifact of 
the greater speed to the W= stimulus. The 
direction of change in the relationship of the 
partial and continuous stimuli from start to 
goal, that is, the successive slowing down of 
performance to the partial relative to the 
continuous stimulus, seems to be the same in 
both cases; the W= group, however, starts 
out faster than the B+ group. It is as if the 
goal panel on the left side of Figure 2 belongs 
at about the level of the Run II or Run III 
panel on the right. This color difference 
seems to appear in some of our experiments 
but not others, and it is obviously a stimulus 
intensity effect characteristic of within-S 
designs in which adaptation level (Helson, 
1964) and contrast factors would naturally 
be important (Beck, 1963; Grice & Hunter, 
1964). 

A graphical plot of individual-S acquisi- 
tion data suggests that all Ss show the 
progressive slowing down of performance to 
the partial relative to the continuous stimu- 
lus as the goal is approached but that there 
are three main ways in which this relative 
slowing down manifests itself. Figure 3 shows 
individual. acquisition data for S 11, from 
the W4B-+ group, and Ss 10 and 4 from 
the B+W+ group S 11 is representative 
of 7 of the 20 Ss, 6 from Group W+B+, 
which show progressive slowing down to the 
partial stimulus but no cross over of the 
eurves, even in the goal measure. S 10 is 
representative of six of the Ss, all from 
Group B+W, which show faster running 
to the partial stimulus in the early measures 
and a progressive slowing down to both 
stimuli as the goal is approached, the amount 
of slowing being greater to the partial 
stimulus. Slower running to the partial than 
to the continuous stimulus, which is char- 
acteristic of the goal measure in between-S 
experiments, is found as early as the first 
running measure in this type of animal. 


PRE WITHIN AND BETWEEN SUBJECTS 11 


MEAN SPEED 


Ta ETN 


15343 $ T$ 301795945 


BLOCKS OF TRIALS 


Fig. 3. Data for three individual Ss from Experiment 1 representing three patterns of responding 
to Sı+ and S++ in the five measures. The data are plotted in 6-day blocks. 


Finally, S 4, which is representative of six 
Ss, four from Group WB-+ and two from 
Group B+W, shows faster running to the 
partial stimulus in the start and running 
measures and a tendency toward slower 
running to the partial stimulus in the goal. 
This type of S most closely resembles be- 
tween-S findings (Goodrich, 1959) and the 
within-S findings of Amsel et al. (1964). 
One S in the B+W+ group showed no 
difference between partial and continuous 
performance in any measure except the goal, 
where responding was slower to the partial 
stimulus. 

These findings suggest that PRF and 
CRF, manipulated within Ss, influence vigor 
of responding in a consistent fashion: there 
Is, in almost every case, relatively more 
slowing to stimuli signaling PRF as the goal 
is approached. However, magnitude and 


direction of absolute differences between 
PRF and CRF acquisition performance over 
segments of the runway vary among in- 
dividual Ss. These individual differences in 
reaction seem clearly to be related in part to 
the physical intensity of the partial stimulus 
(white or black) ; they may also be related to 
differences in “tolerance” for aversiveness 
(anticipatory frustration) among Ss, that 
is to say, to differences in “personality.” 
Davenport (1963a) has reported a related 
finding in an experiment in which magnitude 
rather than percentage of reward to Bi 
and Sz was manipulated. This was basically 
an experiment on spatial discrimination 
(choice); however, forced trials were also 
administered to both stimuli, one associated 
with a five-pellet reward the other with a 
one-pellet reward, and on these trials all 
Ss showed relatively similar patterns of 
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starting speed over trials to the five-pellet 
stimulus, but four different response pat- 
terns to the one-pellet stimulus. 

Extinction. The extinction data for each 
measure and condition are presented to the 
right of the corresponding acquisition curves 
in Figure 2. In this experiment, a within-S 
PRE does seem to occur, at first glance. 
However, closer inspection suggests that the 
extinction differences (a) reflect mainly 
acquisition differences and (b) do not repre- 
sent the true differences in slope character- 
istic of differences in persistence between 
partial and continuous groups in between-S 
experiments. 

The analyses of variance for the extinction 
block of trials show a significant effect of 
Color Group at the .01 level in all measures 


except start which is at the .05 level; a 
significant effect of Day in all measures 
(p < .001); and a significant Color Group X 
Day interaction in all measures except Run 
II (p < .01 for start and « .001 for the 
other measures). The Color Group X Day 
effect reflects faster overall extinction in the 
B+W-+ group. Reinforcement is significant 
in all measures (p « .001), and Color 
Group X Reinforcement is significant only 
in the Run II and III measures (p « .05 
and .01, respectively). These two effects 
reflect terminal acquisition levels of per- 
formance, and they, along with the lack of a 
Reinforcement X Day effect, indicate that 
the differences in extinction in Figure 2 are 
due to acquisition asymptote and are not the 
result of different rates of extinction to the 
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M Fre. 4. Individual animal extinction data for the goal measure for 20 Ss following S12-824- acquisi- 
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partial and continuous stimuli, which is the 
indicant of the PRE. Linear contrasts 
showed reliable differences between con- 
tinuous and partial extinction performance 
in all measures except goal for the W-EB4- 
group, and no differences in any measure for 
the B+W-+ group, thereby verifying the 
graphical picture in Figure 2. 

It is possible to have a closer look at this 
kind of thing in the case of within-S experi- 
ments by looking at the performance of 
individual Ss to both stimuli. Figure 4 
shows individual performance curves for the 
goal measure for all 20 Ss. Ss 1-10 are in the 
B+ group, and Ss 11-20 are in the W+ 
group. There is some suggestion that Ss 1, 
2, 3, 4, and 10 show PRE-like differences in 
slope in the goal measure as do Ss 13, 14, 
and 16. In no case was there a suggestion of 
faster extinction to the partial stimulus. 
Although the overall analysis provides no 
basis for a PRE in this experiment, these 
individual-S data seem to suggest that 
differences in rates of extinction can be 
Observed in within-S PRF experiments. 
However some caution must be exercised: 
the data presented were gathered under 
procedures in which the number of rewards 
(not trials) to the PRF and CRF stimuli 
were equated; the more usual procedure in 
between-S experiments is to equate number 
of trials. In between-S studies involving as 
many trials and as large a reward as we 
employ, giving twice as many rewarded 
trials to the continuous group would facili- 
tate extinction. Arguing from this, our pro- 
cedure should not particularly favor the 
appearance of a PRE. The procedure of 
running twice as many extinction trials per 
day to the previously partial stimulus, as 
we did in this experiment, should also be 
unfavorable to the appearance of a PRE. 


EXPERIMENT 2 


Experiment 2, like Experiment 1, was a 
within-S PRF procedure followed by 
within-S extinction. There were a number of 
minor differences between the two experi- 
ments and one major difference: while the 
first experiment was conducted with rein- 
forcements equated to the two stimuli 
(S:ÆS:+), the second equated for trials 


(S:-82-), the more usual case in between-S 
experiments. The other difference of impor- 
tance was that Experiment 2 was conducted 
in the shorter version of the apparatus al- 
ready described, involving only three runway 
measures instead of five. 


Method 


Subjects. The Ss were 10 male hooded rats from 
the colony maintained by the Department of 
Psychology, University of Toronto. They were 115 
days old at the beginning of the experiment. 

Procedure. Twenty-five days prior to the be- 
ginning of experimental training, Ss were housed 
in individual cages and put on a 23-hr. depriva- 
tion schedule, They were fed 10 gm. Purina lab 
chow daily, and water was available at all times 
throughout the experiment. During the establish- 
ment of the deprivation schedule, Ss were removed 
from their cages and handled by E for 5 min. each 
day. No other special pretraining procedures or 
habituation to the apparatus were carried out. 

The Ss were run four trials a day in the shorter 
version of the apparatus. Five Ss ran under the 
B+W+ condition while the other five Ss ran 
B+W-. On any experimental day, each S ran 
two trials to each stimulus. The 12 possible orders 
of trials were randomized, separately for each S, 
within each 12-day block. 

Training was carried out over a period of 42 
days (84 trials to each stimulus). Within any single 
trial, the procedure followed was identical to that 
in Experiment 1. The intertrial interval was about 
15 min., and the reward magnitude, where applica- 
ble, was 500 mg. given in one Noyes pellet. 

Extinction, also, was conducted on a four-trial- 
a-day basis, again with an intertrial interval of 
15 min. 

Following the final daily trial, Ss were trans- 
ported back to their home cages where they re- 
ceived the balance of their 10-gm. daily ration 
40 min. later. 


Results 

The results of the second experiment are 
presented in Figures 5 and 6. Figure 5 
presents all of the data in 2-day blocks 
(four trials to each stimulus) for acquisition 
and extinction, averaged for all Ss across the 
two color groups. The most obvious features 
of these data, as compared to the data of 
Experiment 1 (see Figure 2), are (a) that the 
aequisition differences evident in the start 
measure in Experiment 1 were not evident 
in the start measure of Experiment 2; (b) 
that the greater running speed to the partial 
stimulus which was a characteristic of all 
three running measures in Experiment 1 
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Fie. 5. Within-S acquisition and extinction 
for combined color conditions of Experiment 2. 
Acquisition and extinction are plotted in 2-day 
blocks of trials (mean of four partial and four 
continuous trials). 


was also clearly evident in the one running 
measure of Experiment 2; (c) that the ap- 
pearance, early in training, of greater speeds 
to the partial stimulus than to the con- 
tinuous in the goal measure, suggested in 
Experiment 1, was also clearly evident in 
Experiment 2; (d) that the ultimately slower 
partial than continuous speeds found in the 
goal measure in Experiment 1, at least in the 
B+W-+ condition, was not evident in the 
combined data of the present experiment 
(and was not found in either of the color 
conditions) ; and (e) that there seemed to be, 
again, little, if any, difference in the rates of 
extinetion to the partial and continuous 
stimuli. 

Separate analyses of variance were con- 
ducted for each of the three measures for two 
segments of acquisition (Days 1-20 and 
Days 21-42) and for extinction. The factors 
in these analyses were Color Group (a 
between-S factor), and Day and Reinforce- 
ment (within-S factors), the Reinforcement 
factor testing continuous against partial 
speeds. 

The analyses show that the Day factor is 
significant at the .01 level in all measures 
throughout the entire experiment, which 
points up the fact that the learning curves 
are changing even beyond the first 20 days 
(80 trials). Perhaps the most noteworthy 
aspects of the analysis are those related to 
the Reinforcement factor. Reinforcement 


and interactions involving reinforcement 
are not significant in any segment of the 
experiment in either the start or goal meas- 
ures. In the first 20-day segment of the run 
measure, Reinforcement and Reinforce- 
ment X Day are significant (p < .05). Over 
the latter 20-day segment of acquisition, 
Reinforcement (but not the Reinforce- 
ment X Day interaction) continues signifi- 
cant in the running measure (p < .01). All 
of this is statistical corroboration of char- 
acteristics of the graphical analysis that have 
already been described. 

A further point to be noted is that the 
analysis of extinction shows no Reinforce- 
ment effects and, in particular, no significant 
Day X Reinforcement interactions, which 
would, if they had been present, indicate a 
within-S PRE. 

The only other significant effects occur in 
relation to color. During acquisition there 
is a significant Color Group effect in the 
goal measure during the first 20 days (p < 
.05), and in the start measure during the last 
half of acquisition (p < .05). This Color 
Group effect simply reflects a difference in 
absolute speed: the group run under B+: W+ 
conditions tends to start faster than the 
group run under W+B+ conditions, and a 
differenee in the same direction occurs in 
the goal measure during the first 20 acquisi- 
tion days. In extinction there is a Day X 
Color Group interaction in all measures, as 
well as a Color Group effect in the start 
measure, which indicates that the W+B 
group extinguishes more rapidly (to both 
stimuli) than the BŁW group. 

In Figure 6 we present extinction data for 
each of the 10 Ss of the experiment, sep- 
arately, for each of the three performance 
measures. Each panel of this figure portrays 
extinction data for a single S for a single 
measure to the two stimuli to which the S 
reacted in extinction, one associated with 
partial reinforeement the other with con- 
tinuous. If there is any pattern to these data 
within-Ss, it is a suggestion, not borne out by 
the statistical analysis, that Ss extinguish 
faster to white than to black regardless of 
the pattern of reinforcement that has been 
associated with these stimuli during aequisi- 
tion. The one fact that does seem to stand 
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Fra. 6. Individual S extinction data for all 10 Ss and for all three runway measures in Experiment 2. 
The solid curve represents performance to S:+, the dotted curve to Sit. Data are plotted in 2-day 


blocks of trials. 


out very clearly is that there is an obvious 
difference between the color groups, extinc- 
tion slopes to both stimuli being generally 
much steeper for W+B than for B+W. 
This was, of course, supported by the signifi- 
cant Day X Color Group interaction of the 
statistical analysis. One could argue from 
this pattern of results, and from the lack of a 
Reinforcement X Day interaction, that for 
each S the rate of extinction to the two 
stimuli is not determined by the schedule of 
reinforcement associated with each but by 
the color of the partial stimulus, being 
greater for W than for B+. The conjecture 
then would be that the characteristics of the 
extinction response to the partially rein- 
forced color generalized to the continuously 
reinforced color within the same Ss, and 
that the Day X Color Group interaction 
which shows up as significant in two meas- 
ures is mainly due to the color of the partial 
stimulus. Another way of looking at this is 
that the white stimulus, relative to the black, 
is more effective in generating whatever the 


dominant tendency is at the moment; that 
in acquisition the white stimulus generates 
more “excitement,” while in extinction it 
generates more “inhibition.” 


INTRODUCTION TO EXPERIMENTS 3 AND 4 


The first two experiments involved a 
within-S acquisition procedure followed by 
within-S extinction. In Experiment 1 rein- 
foreements were equated between S; and 
S2, whereas in Experiment 2 trials were 
equated. The procedure of equating for 
reinforcements, which was also the procedure 
of the pilot experiment (Amsel et al., 1964), 
provided a replication of the pattern of 
acquisition effects in the original experiment 
when the stimulus conditions were the same 
(B4W+). There appeared in this experi- 
ment to be some suggestion of a within-S 
PRE. However, these were not as dramatic 
as those ordinarily seen in between-S experi- 
ments; were not significant in group analysis; 
and were, at most, merely suggested by an 
examination of the data from individual Ss. 


16 A. AmsEL, M. E. RasHOTTE, AND J, R. MacKinnon 


The second experiment, in which trials 
were equated, also provided a clear picture 
of more vigorous performance to the partial 
than to the continuous stimulus, but only in 
the run measure—neither the start nor the 
goal measure showed any evidence of dif- 
ferences of performance to the two stimuli. 
The extinction curves to Sı+ and Ssd- were 
virtually identical, and even an examination 
of the individual animal data revealed no 
evidence of differences in extinction to the 
two stimuli. 

Experiments 3 and 4 were conducted for 
the purpose of looking further into within-S 
PRF effects under conditions of equated 
trials to Sı and Sz (rather than equated 
reinforcements). Each of these experiments 
was of the S:-S2+ type and was conducted 
with a four-trial-a-day procedure. Experi- 
ment 3 was run in the shorter version of the 
apparatus involving three runway measure- 
ments, while Experiment 4 was run in the 
longer, five-measurement runway. In Experi- 
ment 3, Ss were extinguished under a be- 
tween-S procedure following within-S ac- 
quisition (see Figure 1), that is to say, half 
of the Ss were extinguished only to Sı while 
the other half were extinguished only to S; . 
In Experiment 4 the extinction procedure 
was, as in the first two experiments, within-S, 
all Ss being extinguished to both S; and S; . 

The particular importance of Experiments 
3 and 4 is, however, that in both experiments 
between-S conditions were run in parallel to 
the within-S condition, so that some com- 
parisons of the results of the two procedures 
could be made. 


EXPERIMENT 3 


Method 


Subjects. The Ss were 40 experimentally naive, 
male albino rats, obtained from Woodlyn Farms, 
Guelph, Ontario. Their age at the beginning of 
the experiment proper was approximately 90 days. 
Two Ss died during the course of the experiment, 
leaving 38 Ss from which data were collected. 

Apparatus. The apparatus was the same as 
that described in Experiment 2. 

Procedure. Upon arrival at the laboratory, all 
Ss were assigned to individual living cages and 
placed on an ad libitum diet of food and water for 
2 days. Fifteen days prior to the beginning of 
aequisition training, Ss were placed on a 23-hr. 
food deprivation schedule. During this period 


each S received a daily ration of 10 gm. of Purina 
lab chow in its home cage. Water was available 
at all times. During this 15-day adjustment phase 
of the experiment, each S received four 5-min. 
gentling sessions. No other habituation or pre- 
feeding procedures were carried out. 

The Ss were run in squads of 10, with animals 
from all experimental groups randomly assigned 
to each squad. The Ss were taken to the experi- 
mental room in a 10-compartment carrying cage, 
and waited for a period of 10 min. before the first 
trial of the day. Each S received two trials on 
Days 1 and 2, and four trials per day for the re- 
mainder of the experiment (152 acquisition trials 
in 39 consecutive days). During acquisition the Ss 
in any squad were run in a different order each 
day, and the minimum intertrial interval was 15 
min. 

The procedure on individual trials was exactly 
as described earlier, as were the manipulations 
of the experimenter before each trial which were 
designed to insure that olfactory and auditory 
cues, and visual cues other than those provided 
by the runways, were approximately constant 
Írom trial to trial and not differentially related 
to reward and nonreward. 

The Ss were randomly assigned to three experi- 
mental groups, corresponding to the designations 
CP, PP, and CC of Figure 1. In terms of our 
generalized notation, Group CP becomes 812-Sst; 
Group PP, S:+S2-+; and Group CC, S,Sz-Ssz. 
The first is the within-S group, and the second 
and third are the “controls” we selected to run, 
a between-S partial group and a between-S con- 
tinuous group with within-S exposure to two 
stimulus colors. 

Group Si+S:+ corresponds to the condition 
run as Experiment 2: running to one runway 
brightness (Sı+) was partially reinforced, and 
running to the other runway brightness (S2+) 
was continuously reinforced. For 10 of the 19 Ss 
in this (within-S) group, Sı+ was the white alley, 
and Se was the black; for the other 9 Ss these 
relationships were reversed. On any experimental 
day, each S of S,+:S.+ ran 2 trials to Sit and 2 
trials to S;z. The 12 possible orders of daily 
presentation of trials were randomly assigned to 
successive 12-day blocks of training for Ss under 
one color condition (e.g., Ba+-W=+). Ss running 
to the other stimulus arrangement received the 
reverse order of trials with respect to color. (For 
instance, if the Ss running to B+ ran B-W+ 
W-4-B4- on a daily block of trials, the Ss running 
to Wa ran W—B+B+W-+). 

The Ss in Group SitSz-+ were rewarded 50% 
of the time to both the black and white stimuli, 
while Ss in the third Group (Siz-Ss-E) were con- 
tinuously reinforced for approach to both stimuli. 
These two groups together constituted & be- 
tween-S experiment run under conditions a8 
similar as possible to the within-S procedure. 
(See our discussion of this problem in the intro- 
ductory comments.) 

Thirty-six extinction trials were run in 9 con- 


PRE wITHIN AND BETWEEN SUBJECTS 


secutive days. The Ss were extinguished to one 
stimulus only. Half of the Ss in each of the three 
groups were extinguished in the 8; alley and half 
in the S» alley. 


Results 


In this experiment, as in Experiment 4, we 
will be comparing data collected under 
within-S and between-S conditions. Figure 7 
shows the results of this experiment for all 
measures, both in acquisition and in extinc- 
tion, and for between-S and within-S condi- 
tions. The acquisition data are plotted in 
2-day blocks and the extinction data in 
daily blocks of trials. It is important to 
remember that in Figure 7 each panel repre- 
sents, for one measure, one color version of 
the within-S experiment (e.g, B2W+) 
with its associated between-S counterpart. 
The between-S experiment was described in 
the introduction as CC versus PP. CC refers 
to the fact that a group of Ss is run under 
continuous reinforcement conditions to two 
separate stimuli (BzEW z:); PP is the condi- 
tion in which the same Ss are run under 
partial reinforcement conditions to two 
stimuli (B+W). In each panel, therefore, 
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we compare our  within-S condition 
(W+B+) to Ss with the same reinforce- 
ment-color combinations from the between-S 
condition (e.g., W+ from the CC group and 
B+ from the PP group). Likewise, the 
WB within-S condition, in the left-hand 
panels of Figure 7, is compared with re- 
sponses to B+ from the CC condition and 
W= from the PP condition. These kinds of 
arrangements (which hold in this experiment 
and the next) seem to us as good as can be 
made for comparing data from within-S 
and between-S PRF experiments. 

As we look at these data we should re- 
member that, in this experiment, extinction 
was between-S in all cases; that is to say, 
each extinction curve represents a separate 
group of Ss extinguished to only one color. 
In all cases, every S saw both colors during 
acquisition. Therefore, the only factor that 
differentiated Ss during extinction was 
whether their acquisition had been within-S 
or between-S. It boils down to this question: 
Will Ss who acquire a response under 
W=B+ conditions and are then split in 
extinction, so that half of them extinguish to 
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arately for each of the color conditions of Experiment 3. Acquisition data are plotted in 2-day blocks of 


trials, extinction data are plotted in daily blocks. 
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W and half to B, show the same extinction 
differences as might appear between a group 
extinguished to B after WB acquisition 
and a group extinguished to W after W+B 
acquisition? 

Extinction. Looking first at the extinction 
data, it is clear that the question we just 
posed must be answered in the negative. 
Comparing the S;--852- and S1zESs3- groups 
(open circles) there is clear evidence of the 
typical PRE in every panel of the graph even 
though all Ss were exposed to two stimuli 
during acquisition rather than (the usual) 
one and were then extinguished under only 
one of the two stimuli. What seems fairly 
clear from these data is not only a difference 
in slope but also a difference in acceleration 
(and this may turn out to be the most un- 
mistakable indicator of the genuine PRE): 
the continuous group in extinction shows a 
strong negative acceleration, whereas in the 
partial groups acceleration seems to be, if 
anything, positive. 

Between-S extinction following within-S 
PRF training seems not to show these effects, 
although at first glance there appear to be 
differences between the extinction groups. 
The left-hand panels show the partial- 
stimulus group apparently extinguishing 
more slowly than the continuous-stimulus 
group whereas the right-hand panels seem 
to show the reverse. However a closer ex- 
amination shows that there are no slope 
differences between these extinction groups 
following within-S acquisition, and that, in 
fact, the slopes of both groups seem very 
similar to the slope of the comparison be- 
tween-S partial group in extinction. Further 
examination of these extinetion data will 
show that the apparent differences in be- 
tween-S extinction that show up in the 
groups after within-S acquisition are largely, 
if not entirely, due to color, the case being 
that Ss extinguished to white show generally 
higher performance levels in extinction than 
the Ss extinguished to black. This shows up 
in the statistical analysis‘ as a significant 


* The analyses of variance for extinction in this 
experiment are all between-group analyses since 
in the within-S groups half of the animals are 
extinguished to each color. As a consequence, the 
between-S factors in both within- and between-S 


Color effect in the start (p < .01) and 
run (p « .05) measures and an almost 
significant effect in the goal measure (p < 
.10). There are no significant interactions 
between Color and Day or between Color 
and Reinforeement in any analysis. This 
would suggest that the observed differences 
between within-S partial and continuous 
curves in extinetion are due to color alone, 
and that in the absence of such color effects 
the two within-S extinction curves are almost 
inseparable from one another and from the 
between-S partial extinction curve. The 
combined curves are shown as Figure 8, and 
the extinction curves strongly support these 
conclusions. 

On the other hand, analyses of variance of 
the data from extinction following between-S 
acquisition show significant Reinforcement 
effects in all measures (p < .001, .01 and .05 
for start, run, and goal measures respec- 
tively), as well as significant Reinforce- 
ment X Day interactions in all measures 
(p < .001 in start and run and < .05 in goal), 
this latter reflecting differences in slope and 
being the critical indicant of the PRE. 
There are also in the between-S extinction 
data significant Reinforeement X Color 
(p < .05) and Color X Day (p « .01) inter- 
actions in the running measure; and, a 
significant triple interaction among Rein- 
foreement, Color, and Day in the goal 
measure (p « .05). 

What all of this suggests is that (a) the 
usual PRE shows up very clearly following 
between-S aequisition but that there is no 
indication of a PRE following within-S 
acquisition; and (6) color is a factor in both 
the within- and between-S groups in extine- 
tion but affects each differently. While color 
interacts with both Reinforcement and Day 
in between-S extinction following between-S 
acquisition, it does not interact with any 
factors in between-S extinction following 
within-S acquisition. 

Acquisition. The acquisition data of Figure 
7 present a rather complex picture, and we 
will be interested in the between-S effects, 


extinction analyses are Reinforcement, Color, 
and Reinforcement X Color. The within-S factors 
are Day, the interactions Day X Reinforcement 
and Day X Color, and the triple interaction. 
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Fig. 8. Within- and between-$ acquisition and extinction data from Experiment 3 combined across 
color conditions. 


lud within-S effects, and a comparison of the 
WO. 

Between-S effects can be summarized 
rather quickly. In every measure for both 
color groups there appears to be a difference 
in speeds in all cases in favor of the con- 
tinuous stimulus. This is not the ordinary 
partial reinforcement acquisition effect as 
found in experiments of Goodrich (1959), 
Wagner (1961), and others. While we do find 
a progressive decrease in this difference as we 
go from the goal to the start measure, until 
at the start the difference is very small, there 
is no evidence of the actual crossover effect 
in acquisition that sometimes shows in the 
starting and running measures. The acquisi- 


tion data are plotted in 19 2-day blocks, and 
for purposes of statistical analysis the entire 
period was divided in half, ignoring the first 
block of trials. A separate statistical analysis 
was made for each measure and for the two 
halves of acquisition (Days 3-20 and Days 
21-38) which correspond to Blocks 2-10 and 
11-19. 

The analyses of variance for acquisition 
were done separately for within- and be- 
tween-S data. In the between-S analyses 
Reinforcement is a between-S factor, and 
Day and Color are within-S factors; in the 
within-S analyses Color Group is the be- 
tween-S factor, and Day and Reinforcement 
are within-S factors. 
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Analyses of the between-S acquisition 
data showed a very clear picture. In all 
measures and for both segments the Day 
effect was significant. In the start measure 
the Reinforcement effect, partial versus 
continuous, was not significant from Days 
3-20 but was significant at the .05 level for 
Days 21-38. The same pattern of significance 
for the Reinforcement factor was found in 
the run measure: it was not significant over 
the first segment but was significant at the 
.05 level for the second. In the goal measure, 
however, Reinforcement was significant 
beyond the .01 level in both the early and the 
late segments of acquisition. The only other 
significant effects in the between-S data were 
Color effects in the first segment of acquisi- 
tion in the run measure (p < .01), a signifi- 
cant Color effect in the first segment of the 
goal measure (p < .01), and a significant 
Color X Day interaction (p < .01) in the 
first segment of acquisition in the run 
measure, and in the second segment in the 
goal measure (p « .05). 

The within-S curves of Figure 7 (solid 
dots) present an acquisition pieture which is 
not too dissimilar to the between-S picture 
at the end of acquisition but which differs 
considerably early in training. By the end of 
acquisition both groups show, in all meas- 
ures, faster responding to S:+ than to Si-k, 
but these differences are considerably smaller 
than those observed in the between-S 
eurves. There is, then, at the end of acquisi- 
tion, no evidence of the partial reinforcement 
acquisition effect (faster to Sı) observed in 
the early alley measures in Experiments 1 
and 2, but a clear goal measure effect is in 
evidence. Earlier in acquisition there is some 
evidence for this effect in the running meas- 
ure when white is the Sit, and faster speeds 
to the partial stimulus are clearly seen in 
the goal measure early in acquisition, that 
is, up to about the fifth or sixth acquisition 
block. In the start and run measures both 
within-S curves follow, roughly, the contour 
and level of the between-S partial curve, and 
in the goal measure the within-S curves begin 
and remain approximately intermediate 
between the between-S curves. 

Analyses of variance were conducted for 
each measure and over the same two seg- 


ments of acquisition as were analyzed for 
the between-S groups. In each measure and 
in both the first and second halves of acquisi- 
tion there is a significant Day effect (p < 
.001 in all cases). In addition to this overall 
effect a pattern of significance related to 
Reinforcement emerges. In the start measure 
Reinforcement is significant in the first 
segment of acquisition (p < .05) and just 
fails to reach significance in the second 
segment, reflecting an overall tendency for 
starting to the partial stimulus to be slower 
than to the continuous stimulus. A signifi- 
cant Reinforcement X Day interaction 
(p < .001) in the second segment of acquisi- 
tion (Blocks 11-19) reflects faster starting 
to the partial stimulus beginning about 
Block 10 followed by relatively marked 
slowing about Block 15 which persists until 
the end of acquisition. In the run measure, 
Reinforcement fails to reach significance as a 
main effect in either segment of acquisition 
but does appear as a Color Group X Rein- 
forcement interaction (p < .01) in the first 
half of acquisition and a Day X Reinforce- 
ment interaction (p < .001) in the latter 
part of acquisition. The first interaction 
reflects the tendency for running to S:-+ to 
be (a) faster than to S+ for the W+B+ 
group and (b) slower than to the S.=+ for 
the BŁW# group during the first half of 
acquisition, again suggesting that the white 
stimulus seems to energize the dominant 
response tendency. The second interaction 
reflects the more rapid slowing to Sı+ as 
opposed to S+ as training progresses, 
though performance to both tends to grow 
weaker as training progresses. This finding 
was also evident in the start measure. In , 
the goal measure the main effect of Rein- 
forcement was not significant in the first 
segment of acquisition, but a Color Group X 
Reinforcement interaction (p < .05) and à. 
Day X Reinforcement interaction (p « .01) 
reflect the clearer difference between partial 
and continuous curves very early in training 
in the B--W3- group. In the second half of 
acquisition, however, Reinforcement, and 
Day X Reinforcement are significant (p < 
001, and .05, respectively) reflecting the 
differences between groups and across days 
seen in Blocks 11-19 in Figure 7. The main 
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difference between the color groups again 
appears to be relatively slower running to 
W= than to B+ suggesting that the inhibi- 
tory effect near the goal in within-S acquisi- 
tion is greater and sets in earlier when white 
is partial. 

Attention should be called to the difference 
between the kind of result obtained in 
within-S training and that obtained in 
between-S training in the earliest blocks of 
the goal measure. In between-S training, 
performance for the continuous group is 
superior from the very outset to that of the 
partial group. In within-S training, where the 
same Ss run to both partial and continuous 
reward the picture is a rather different one. 
(This same effect is apparent in Figures 2 and 
5 of Experiments 1 and 2, respectively; 
however, the between-within comparison 
of Experiment 3 makes it stand out even 
more clearly.) At a relatively early stage of 
training, Ss run faster to Si: than to 84-; 
then, between the fifth and sixth blocks of 
trials, the eurves cross and the partial curve 
remains below the continuous curve until 
acquisition training is terminated. This 
relationship can be seen somewhat more 
clearly in Figure 8 where the color curves are 
combined. Estimates of the slope for in- 
dividual Ss of the within-S group over the 
first five blocks of acquisition trials were 
obtained, and a subsequent test of the dif- 
ferences in slope to Sit and Ss approached 
significance. 


EXPERIMENT 4° 


Experiment 4 is a combination of a 
within- and a between-S experiment con- 
ducted at separate times and is, in some 
respects, like the previous three but in some 
respects different from each of them. Like 
Experiment 3 it contains both within-S and 
between-S conditions, but unlike Experiment 
3 (a) it is run in the longer (five-segment) 
runway to get a better look at transitional 
effects from one segment to the next in 
acquisition, and (b) it involves a within-S 
rather than a between-S extinction proce- 
dure. In these latter respects, then, the 


5 We are indebted to Bohdan P. Kolesnik who 
assisted in the collection of data in this experi- 
ment. 


present experiment is like Experiment 1; but 
unlike Experiment 1, trials are equated 
rather than reinforcements in the within-S 
condition and a between-S comparison 
condition is added. 


Method 


Subjects. The Ss in each experiment, within- 
and between-S, were 20 experimentally naive male 
albino rats, about 110 days old at the beginning 
of experimental training. During the course of 
training one S died in the within-S experiment 
and one S was discarded in the between-S experi- 
ment because of illness, leaving 19 Ss from which 
data were collected in each experiment. 7 

Apparatus. The apparatus was the same as 
that used in Experiment 1. 

Procedure. Three weeks before the beginning 
of each experiment Ss were placed on a 23-hr. 
food deprivation schedule. During this period 
each S received a daily ration of 11 gm. of Purina 
lab chow in its home cage. Water was available 
at all times. During this 3-week period each S 
was handled every 2 days for a few minutes. No 
habituation or prefeeding procedures were carried 
out. 

Tn each experiment the original random assign- 
ment of Ss was into two groups of 10 each. As in 
Experiment 3, one within-S group ran under 
B+W=+ and the other under W+B conditions, 
while one between-S group ran under B++-W=+ and 
the other under B--Wz- conditions. 

In each experiment Ss were run in two squads 
of 10, with 5 Ss from the BLW+ and WB 
groups in each of the two squads in the within-S 
experiment, and 5 Ss from the B&W and BW 
groups in each of the two squads in the between-S 
experiment. Squads were transported to the ex- 
perimental room in a 10-place carrying cage and 
waited 10 min. before the first trial of the day. 

A total of 160 acquisition trials were run in 
each experiment, followed by 48 extinction trials 
in the within-S experiment, and 64 extinction 
trials in the between-S experiment. In both ex- 
periments there were four trials on each day, two 
to Sı and two to S:. Speed data are presented 
from the between-S experiment for only the first 
48 extinction trials, since no significant speed 
differences oceurred after this point. The addi- 
tional trials were for the purpose of gathering 
retrace data which are a subject of analysis in 
this experiment. 

In the within-S experiment the order of pres- 
entation of stimuli was determined by randomly 
assigning six permutations of four trials (B+, 
B—, W+, W+) to days within successive 6-day 
blocks over the duration of the experiment. This 
randomization was done separately for each S. 
The six orders of stimulus presentation were ob- 
tained by arranging the trials with the restriotion 
that the first and second trials of each day be to 
different stimuli. In the between-S experiment 
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the same order of presentation of stimuli was 
employed; but for the BW group all trials 
were rewarded, while for the B+ W= group only 
one trial to each stimulus was rewarded on each 
day. The Ss were run with a minimum intertrial 
interval of 15 min. The details of procedure on 
any individual trial were exactly as described in 
the first three experiments. 

One S in the W4B+ group of the within-S 
experiment died during the course of training. 
For purposes of statistical analyses the S with 
the same number in the B--W-F group was re- 
moved; however, all 10 Ss from this latter group 
are included in the graphical analyses. Similarly, 
in the between-S experiment one S from the partial 
group was discarded due to illness, and the data 
of the S with the same number in the continuous 
group were removed for the statistical analysis 
but left in the graphical presentation of the re- 
sults. Group Ns were equalized in the same way 
to allow meaningful comparisons of retrace data 
between groups. 

"The results of both experiments were combined 
to make within- and between-S comparisons from 
the same conditions and together will be referred 
to as the results of Experiment 4. 


Results 

The results of Experiment 4 will be pre- 
sented in somewhat more detail than the 
others because it allows for the largest 
number of comparisons. We will begin with a 
detailed graphical description (Fig. 9) of the 
within-S and between-S data both for 
acquisition and extinction showing the 
various relationships for all five measures. 
The summary of an analysis of variance of 
the data from the within-S portion for all 
measures is reported in Table 1, and from 
the between-S portion in Table 2. Then, in 
Figure 10, we will present the data from the 
within-S portion of the experiment in a 
manner which stresses the changing relation- 
ships between response strength to S, and 
Ss that occur across successive segments or 
measures of the runway. In Figure 11 extinc- 
tion data are presented for individual Ss in 
the within-S condition for the goal measure 
only, and finally, Figures 12, 13, and 14 
present a variety of comparisons based on a 
retrace measure taken for the first time in 
the between-S groups in this experiment. 
This measure is simply a count of the number 
of trials on which Ss retrace in the alley, that 
is, the number of trials on which they 
stopped, turned, and went in the opposite 
direction from the goal. 


Figure 9 is a summary of all of the speed 
data in this experiment. It contains 10 
separate acquisition-extinction panels, one 
for each of five measures for each of the two 
relationships between color and reinforce- 
ment. The acquisition data are plotted in 
4-day blocks of trials and the extinction data 
in 2-day blocks. While there are differences 
from panel to panel, these data have certain 
outstanding features. The most prominent 
single feature is the difference, in all meas- 
ures, between the continuous between-S 
curve and all of the others, both in acquisi- 
tion and extinction. In acquisition the indica- 
tion is that, as in Experiment 3, the be- 
tween-S continuous group shows greater 
vigor than any of the others, while in extinc- 
tion the indieation is that this group ex- 
tinguishes more rapidly than any of the 
others. The other features of note concern 
within-S and between-S comparisons, sep- 
arately. 

Within-S acquisition and extinction. The 
pattern of results in the within-S curves, 
both across color conditions and across 
performance measures (within color condi- 
tions) is very similar to that of Figure 2 of 
Experiment 1. While the differences are not 
so pronounced as in the earlier experiment, 
the indications are the same: the paradoxical 


effect, faster running to the partial than to _ 


the continuous stimulus, is clearer when 
Sit is white than when it is black, and this 
difference under the W4+B+ condition 
disappears and even reverses slightly in the 
goal measure. When S;2- is black (B+W) 
there is little if any indieation of faster 
running to the partial stimulus, but as we 
go from the start measure to the goal meas- 
ure there is a gradual divergence of the 
curves toward faster running to S2=+. These 
relationships were made the subject of 
analyses of variance, a summary of which we 
present in Table 1. In these analyses, as can 
be seen from the table, Color Group is a 
between-S factor and Day and Reinforce- 
ment are within-S factors in the acquisition 
and extinction analyses. Analyses were 
performed for each measure at three separate 
stages of the experiment: from the beginning 
to the middle of the acquisition period 
(without the first 2 days), from the middle to 
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Fro. 9. Within- and between-S comparisons of acquisition and extinction performance shown sep- 
arately for the two color conditions of Experiment 4. Acquisition data are plotted in 4-day blocks of 
trials (means of eight partial and eight continuous trials) and extinction data are plotted in 2-day blocks. 


the end of the acquisition period, and for all 
of extinction. 

| In the analyses there was no main effect 
of Color Group in any measure in any seg- 
ment of the experiment. There was a reliable 
effect of Day throughout the experiment, 
reflecting only the changes which occur in 
acquisition and extinction. The Color 
Group X Day effect is significant only in 
the start measure during the latter half of 
acquisition. 

The pattern of significance related to the 
Reinforcement factor is one which reflects 
the graphical differences between groups 
shown in Figure 9. Reinforcement is signifi- 
cant as either a main effect or interaction in 
all but the start measure in the first half of 


acquisition and in the Run III and goal 
measures during the remainder of acquisi- 
tion. The Reinforcement main effect fails to 
reach significance in the Runs I and II 
measures. The interactions reflect the 
changes in relationship between the partial 
and continuous curves that are evident 
between the groups, especially in the three 
run measures, there being a clear superiority 
of the partial curve in the W+B group, 
with the difference between the curves 
tending in the opposite direction in the 
B+W group. 

In extinction the triple interaction of 
Color Group, Day and Reinforcement is 
significant in the start and Run I measures. 
These interactions reflect the tendency of 


m— d » — kg 
6IL 199 TPL 
GT) (z`) (c^) (9°) (607) woe | (er) | (e) (e£) (gc) qr) | eu p TOLL 
[41 I I> ¥0'T ££'I 6r 681 SUI I> I> I> p: 3ueuroo10jutoH 
X Avq X dnozo 10100 
101 I> og'T £UT «6076 61 $9'I 00°T T'I I> I> p jueuroiogureH X eq 
(ey) (097) (9r D (0D (16) 9T Ge) (97°) (29°) Ge) (eU) | 9T g oug 
1> ++98'ST 98°T 9rT I> I *I9L. | «ELI | +887 62° I> I juouroo10j 
-ureg X dnog 10100 
$ X881 e£ v Orr UY I> I I 40°F »$69 «C99 I> I 3ueuro010jurep 
a (81) (22^) G9) (98°) GT) woe | (88°) (g9°) (ez 1) (28°) (9$) | z g 101g 
WI I> I> 991 | 4296S 61 I> I> I I I> D srq X dnozp) 10100 
3 06'S [exTO Y — 449678. [4440978 .— «446970 6L fsrsPI GT |ee«SP 06 eas v8 9 [eeeZh'S [eexSI'SR | ZT feq 
d (gw (ce&'909) | (Tat) | (ce02) | (96°21) | or (00'9 | (27'240) | (9622) | (ew YO | (F22) | 9r I 011g 
fa I> I> I> I> I I I> I> I> I> I> I dno13 10100 
a 
4 Trop, III unu II ung Tung as fp Trop II ung I ung puny qvis fp 
^ Wop?HA Jo amog 
B OF 12 sea 07-£0 s&vqr 
E uonsmboy 
a y LNSNISGdXO[ JO SA4NOUL) S-NIHLIM UOA SONVIHVA AO SASATVNY JO AuvwKag 
a I AT4VL 
z 
Ei 
E 
E 


a 

H 

m 

2 

a 

100° > d syr 
“TO > d ** 

ES ' > d * 

E. 18h Teo, 

É (gi) (eg) [C] (ce) GT) 941 p 10114 

8 69°T ee $1 PT: -"-—— T duoru9010] 

E -woy X Áeq X dnozr 10709 

E ST «606 I> I> 0c T It quowesrojuroy X Aud 

P (e) (ce) (09°) (¥9") Gr) 9r g 101p 
TI» 6c I I> 16% OLE I jueureoiopure X dnorg 10100 
46°F #420°6 48901 «Sy CI I> I juouroo1ojuloq 
(c£) (20D Goo) (SD (92) 911 € 1011 
I> I> I> I> I> TI Avq X dnory 10100 

x«x££' 29 «xx 09 #420619 exe 9T 89 +++80°09 Tt seq 
(09"#) (92°21) (19°89) (62°78) Q6) 9r I 10104 
I> LUT I> I> I> I dno43 1o[or) 
Trop Ti ung 1 my Immy ms fp uonepea jo amog 
wonounxg 


26 A. AwsEL, M. E. RAsHoTTE, Anp J. R. MacKinnon 


ACQUISITION EXTINCTION 
BLOCK | | BLOCK! | B.ock2 | Blocks 
lo: | i 
sr Wt Bt | | 
ES | | 
x | | 
S a i i | i 
o | i l H 
La i i | 
= | f i 
L s Ww | m 4 
ec Yi e— CONTINUOUS | 
a j 9——9 PARTIAL | 
& i | 
id reer al 4 EHE eiie Bros d OUI RN HEN. 
€ * | BEWE | | | 
z | | | | 
F | PA | 
i l i | 
= | | | 
| 


| | 


nens Tu ETE DOTT " 
START RUN RUNRUNGOALS I II X G S I X X 
III 


1 MM 
6 S IIX GG SIX 


NEMPE " " 
$6 SIIIGSIIISSIIIES 


ALLEY 'MEASURES 
Fra. 10. Within-S acquisition data for the two color conditions of Experiment 4 replotted to em- 
phasize vigor relationships between Si+ and S: in the five segments of the alley. Acquisition data are 
plotted in 8-day blocks of trials, extinction data in 4-day blocks. 


the partial curve to come close to the con- 
tinuous curve as extinction progresses and 
even to cross it in the W+B group while 
the partial curve generally remains above the 
continuous curve, although parallel to it, in 
the BŁW# group. The Day X Reinforce- 
ment interaction in the extinction analyses 
is significant at the .05 level in both the Run 
III and goal measures, the continuous curve 
in the goal measure crossing over the partial 
curve, although the differences are by no 
means great, especially in comparison to the 
between-S differences. There is, then, to this 
mild extent a suggestion of slope differences 
and, therefore, of the presence of a PRE-like 
effect. The significant main effect of Rein- 
forcement in extinction seems to reflect only 
acquisition levels of performance, particu- 
larly in the W+B group. 

The large color effect observed in extinc- 
tion of the within-S groups of Experiment 3 
was not found here, although there are some 
indications of color effects in the early. 
measures. However, extinction in Experi- 
ment 3 was between-S while extinction in 
the present experiment was within-S. It is 


possible that, in Experiment 3, color effects 
are enhanced because, after acquisition to 
both colors, extinction trials are to one color 
only. No significant Color Group X Day 
interactions were obtained in extinction in 
Experiment 4 although they were found in 
Experiments 1 and 2. 

To summarize, the analyses of variance 
suggest that the within-S differences which 
appear in Figure 9 are reliable; that there are 
significant partial reinforeement acquisition 
effects within-S similar to those shown in 
Experiment 1, and that these seem, again, 
to depend somewhat on color; and that there 
are significant differences in extinction due to 
the reinforcement variable in acquisition, 
but that these cannot be thought of as 
depending on anything but the terminal 
level of the partial and continuous curves in 
acquisition. The failure to find a consistent 
or strong Day X Reinforcement interaction 
reflects little, if any, PRE in the sense of 
differences in slope. There is, however, the 
suggestion of such an effect. 

Figure 10 is a graphic reanalysis of the 
within-S data of Figure 9. This entire portion 
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Fra. 11. Individual animal extinction data for the goal measure for Ss from the within-S groups of 
Experiment 4. The data are plotted in 2-day blocks of trials. The solid curve represents S++ performance, 


the dotted curve S12 performance. 


of the experiment is broken down into eight 
blocks of trials, five acquisition blocks (each 
consisting of 8 days) and three extinction 
blocks (each consisting of 4 days). In each 
panel, speed of response to Sı and to S 
are plotted against the five successive meas- 
ures from start to goal. The very subtle 
changes in the relationship between the 
strength of response to S; and to S» over the 
course of the experiment, and the interaction 
of these changes in relation to the measure of 
performance taken, can be seen in à graphic 
analysis such as this. In such an analysis the 
characteristics of the data which have been 
confirmed by the analysis of variance become 
clearer, particularly in the upper panel 
describing the condition in which Si+ is 
white and Set is black. The first block of 


trials shows the partial stimulus evoking 
faster running in the goal measure and the 
adjacent run measure. By Block 2 there is 
faster running in all measures except for the 
start measure to Si; by Block 3 the rela- 
tionship in the goal measures is reversed and 
S24 now produces higher performance, and 
these effects hold through Blocks 4 and 5 of 
acquisition. The general contours of the 
curves from panel to panel indicate (a) that 
the inverted U-shaped function relating 
speed to successive measures increases in 
overall height as we go from Block 1 to 
Block 5 of acquisition, and (b) that the func- 
tion becomes more peaked as acquisition 
training progresses, the fastest performance 
always being in the middle segment of the 
alley. The curves are somewhat asymmetri- 
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cal: there is a consistent difference in speed 
between the first and third run measures, the 
first showing faster speed than the third; 
and there are also slightly higher speeds in 
the goal than in the start measure. These 
characteristies of the data are similar to 
those reported by Weiss (1960) in a runway 
experiment. The picture in extinction is 
(a) of course a decline in the levels of the 
curves from the first to the third extinction 
blocks, (b) a decline also in the peakedness 
and symmetry of the functions, and (c) a 
somewhat clearer indication than in Figure 
9 of faster running in extinction to the partial 
stimulus than to the continuous stimulus. 

The panels of Figure 11 show individual S 
data for the last block of acquisition trials 
and for each of the extinction blocks for the 
goal measure. There is a slight overall 
tendency for the partial curves (dashed line) 
to lie somewhat above the continuous ones. 
The differences are not great, and there are 
few indications of difference in slope. 

Between-S acquisition and extinction. The 
differences in the between-S curves (Figure 
9) are very obvious and clearly replicate the 
results of Experiment 3 in every respect. 
First of all, every measure but the start 
Measure shows that the continuous group 
performs more vigorously than the partial 
group and that there is no suggestion of the 
Goodrich type of crossover effect in these 
between-group data. There is a very clear 
PRE, the continuous group curve dropping 
very abruptly at the first and second extinc- 
tion points, the partial curve showing practi- 
cally no drop at all at these two points. What 
continues also to be remarkable in this 
experiment, as it was in Experiment 3, is the 
similarity of levels and slopes of the two 
within-S extinction curves and the extinction 
curve from the partial between-S group. 
There is some suggestion (as there is also in 
the extinction comparisons of Figure 8) that 
the between-S partial curve reflects greater 
resistance to extinction than the two 
within-S curves. This shows up particularly 
in the Run I, Run II, and goal measures in 
the later extinetion trials, where the slope 
of the within-S curves lies somewhat inter- 
mediate between the slopes of the between-S 
partial and continuous curves. 


Table 2 provides a summary of the be- 
tween-S analyses of variance. Analyses were 
conducted over Days 1-20 (Blocks 1-10 in 
Figure 9), Days 21-40 (Blocks 11-20 in 
Figure 9), and over the extinction days for 
each measure. In these analyses Reinforce- 
ment is a between-S factor and Day and 
Color are within-S factors. The table shows 
that Reinforcement is significant in both 
segments of acquisition in all but the start 
measure where it does not reach significance 
in either segment. This effect reflects the 
obvious acquisition differences shown in 
Figure 9 between partial and continuous 
performance. There is a significant Day 
effect in all measures in both segments of 
acquisition, and a Reinforcement X Day 
effect in the first segment of acquisition in 
the goal measure. The Day effect reflects 
continuing changes throughout acquisition 
and the interaction reflects a slower rate of 
acquisition of the response by the partial 
group in the goal measure. 

In the first segment of acquisition a main 
effect of Color appears only in the goal 
measure, and there is a Day X Color interac- 
tion at the .05 level in the first running 
measure. Several color interactions occur in 
the second segment of acquisition: Reinforce- 
ment X Color in the start and Run I meas- 
ures, and Day X Color in all but the Run 
III measure. These color effects do not show 
any particularly consistent pattern either 
across measures or across acquisition seg- 
ments, but Figure 9 does reflect a tendency 
for performance to the white stimulus to be 
more vigorous when it is partially rewarded 
and to be less vigorous when continuously 
rewarded, especially in the second segment. 

In extinction the Reinforcement, Day, and 
Day X Reinforcement effects are all signifi- 
cant in every measure, the latter effect 
reflecting differential slopes of partial and 
continuous curves and thereby substantiat- 
ing what is already obvious from Figure 9— 
that a PRE occurred in the between-S 
groups. Color appears as a significant effect 
in extinction in a color main effect in Run 
III and in Color X Reinforcement interac- 
tions in the Run I, Run III and goal meas- 
ures. These color effects reflect somewhat 
more rapid extinction of the CRF group in 
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the presence of the white stimulus. These 
same color effects are apparent in the retrace 
data. 

In general, the analyses of variance sup- 
port the pattern of relationships shown in 
Figure 9. 

Retrace data. In this experiment retrace 
data were taken only for the between-S 
groups. These data are presented here 
because they suggest a method of measuring 
extinction performance we had not used 
before, a method which provided such a 
clear-cut indication of the differences be- 
tween the CRF and PRF groups in extine- 
tion that we decided to use it in later experi- 
ments for within- as well as between-S 
analyses. Figure 12 shows number of retraces 
by the continuous (C) and partial (P) groups 
in 2-day blocks over the entire extinction 
phase of the experiment. (No retracing was 
observed in either group except on the first 
few trials during acquisition.) In this figure 
the maximum number of retraces at each 
block is 72 which would be reached only if 
all Ss retraced on every trial. It is very 
evident that the CRF group retraces on far 
more trials than does the PRF group during 
extinction, and that frequency of retracing 
in the continuous group appears to increase 
in a somewhat negatively accelerated manner 
starting with the first 2-day block of trials. 
No retracing was observed in the partial 
group until the fourth block of trials. At the 
point at which we terminated extinction, the 
partial group was retracing at about the 
level of the continuous group at the first 
block. 

Figure 13 shows the data for retracing 
broken down according to the five measures 
taken in the runway. The most noteworthy 
feature of these data is that the retracing 
occurs primarily in the first two segments, 
with less in the last two segments and very 
little in the middle one. Early in extinction 
retracing is seen first, in the goal region and 
continues to occur there; but as training 
progresses, occurrence of retracing tends to 
be more and more in the first segments of 
the runway. Late in extinction there is less 
retracing at the start and increased retracing 
in the goal segment. 

Finally, Figure 14 breaks retracing down 
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Fig. 12. Number of retraces in extinction by 
between-S groups of Experiment 4. Data are based 
on 2-day blocks of trials. Only the first retrace on 
any trial was scored for each animal and a maxi- 
mum score (all Ss retracing on all trials) for a 
2-day block is 72. 
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Fic. 13. Number of retraces by between-S 
groups of Experiment 4 replotted to show segment 
of alley in which retraces occurred. 


according to the color of the alley in which it 
occurs. There is some suggestion that the 
Ss tend to retrace more in the white alley 
than they do in the black. At least this is à 
consistent pieture occurring in every one of 
the 2-day blocks for the CRF Ss. However, 
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Fie, 14. Number of retraces by between-S 
groups of Experiment 4 replotted to show black 
and white alley retraces. 


for each of the alley colors, separately, there 
is an increase of about the same form in 
retracing from the beginning to the end of 
extinction. 


Discussion 


The experiments were performed to extend 
the generality of a finding (Amsel et al., 
1964) that PRF acquisition effects are dis- 
cernible in within-S experiments; to investi- 
gate extinction effects following within-S 
acquisition; and to compare results obtained 
under within-S and between-S conditions in 
both acquisition and extinction. In making 
these comparisons, we came across some 
interesting and unexpected findings, and 
these have raised a number of questions 
which can only be answered by further work, 
Some of which is now under way in our 
laboratory. 

Absence of a “PRE” within-Ss. Why is 
there little, if any, evidence for a difference 
in extinction to two stimuli, one of which has 
been related to PRF and the other to CRF 
in within-S partial reinforcement acquisi- 
tion? In terms of our within- versus be- 
tween-S comparisons, this question becomes: 
Why does extinction to each of the two 
stimuli look more like extinction after partial 
than like extinction after continuous rein- 
forcement? Two answers to this question can 
perhaps be provided by the theory with 
which we have been working (Amsel 1958, 
1962), and one of these is shown schemati- 


cally in Figure 15. The argument dia- 
grammed in the figure is that the within. 
case permits generalization of the persistence 
effect (rr—sr — approach) from S; to S; in 
extinction; that is, as soon as either stimulus 
elicits rr in extinction, the sr stimulus cues 
in the persistence mechanism regardless of 
whether the external stimulus has been 
associated with PRF or CRF training in 
acquisition. Since the within-S case permits 
mediated generalization of the mechanism 
for the PRE, we might expect PRE-like 
effects to both stimuli in extinction. Of 
course, the between-S case does not permit 
such generalization—PRF and CRF acquisi- 
tion develop in separate organisms, and 
transfer of persistence effects is therefore 
impossible under between-S conditions. We 
are suggesting that the critical factor deter- 
mining resistance to extinction is whether 
sr elicits approach (as well as avoidance). 
It would seem, if our reasoning is correct, 
that this internal control of extinction 
behavior was the important factor in our 
within-S experiments, and that differential 
external stimulation was relatively unim- 
portant, even though other aspects of the 
data make it clear that Ss responded dif- 
ferentially to S; and S; in acquisition. 

Spear (1964) and Spear and Pavlik (1966) 
have reported extinction speed data from 
choice experiments which are consistent with 
our data and with an explanation in terms of 
mediated generalization. In these experi- 
ments, Ss were given equal experience with 
partially and continuously rewarded arms of 
a T-maze and were then extinguished in 
both arms of the maze. The extinction data 
showed PRE-like speeds in the continuously 
rewarded arm of the maze, and Spear (1964) 
suggests that the mechanism producing the 
PRE in the continuous arm appears to be 
independent of the external stimulus situa- 
tion. These findings are similar to those 
reported in the present experiments, and 
the theoretical schema provided in Figure 15 
makes explicit a mechanism for producing 
these effects which is independent of external 
stimulus factors. 

Another answer to the question of why 
extinction performance is the same to Si 
and S2+ involves the concept of primary 


| 


PRE wiTHIN AND BETWEEN SUBJECTS 33 


WHY DOES EXTINCTION 


AFTER WITHIN-S PARTIAL REWARD ACQUISITION SHOW NO PRE? 


x ect 
ACQUISITION EXTINCTION 
ois MEAM s 
va LONE App App 
SUBJ; STIM,+ STIMi-—rp -SF 
fp-5p—A, Ay 


STIM, }— fR- sq — App 


STIM — rg - sg —>Av 


Nava 


- 


fR- SR — App App 
stim, È STIM- — rg - SF 7 
[2 fp - Sp >A Ay 
z 
= 
= pr 
STIM; f— rg - sq— App STIMz-— rF- sp 
“Say 


Fig. 15. A comparison of factors operating to affect extinction following between-S and within-S 
acquisition. 


stimulus generalization of rr — s, in acquisi- 
tion. This second possibility, detailed else- 
where (Amsel, 1966) holds that, if Sı and Ss 
are sufficiently similar that rr occurs not only 
to Sit during acquisition but generalizes to 
Set, then sp will become conditioned to the 
S2 response. We would then have no basis 
for expecting a difference in rate of extinction 
to Sı and Sz. While this explanation clearly 
differs from the mediated (or secondary) 
generalization explanation outlined in Figure 
15, the two are by no means incompatible; 
both mechanisms might operate to produce 
the within-S extinction result. 

The recent data of Brown and Logan 
(1965) are similar in their import to the 
extinction data reported in this monograph. 
Their explanation for what they call the 
“generalized partial reinforcement effect,” 
following Logan, Beier, and Kincaid (1956), 
involves primary generalization of learned 
responses to “stop roing” from the partial 
to the continuous stimulus as the mecha- 
nism which accounts for the within-S extinc- 
tion result. This explanation differs from 
our primary generalization explanation in 
that (a) the “stop reing” response is not 


anticipatory in character but, rather, occurs 
after the presentation of the goal event, and 
(b) this generalization occurs in extinction 
and not during acquisition. While we have 
also proposed a possible extinction interpre- 
tation of the generalized PRE, it involves 
mediated generalization rather than the 
primary generalization of the Brown and 
Logan explanation. 

We have embarked on some follow-up 
experiments to provide more information 
about the within-S extinction effect and its 
explanation. One approach we have tried is 
to vary, not only a single stimulus in an 
otherwise constant situation, but to vary 
grosser aspects of the stimulus situation even 
to the extent that approach responses to the 
differing situations will take different forms. 

An experiment on within-8 PRF training 
and extinction (Rashotte, 1966) employed 
pairs of responses which differed in similarity 
in an attempt to increase the within-S 
PRE. There were two conditions in the 
experiment. In the first condition (Ri/Ra) 
rats were trained to make similar running 
responses to approach food in two ap- 
paratuses which differed not only in the 
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black-white dimension but also in other 
respects (e.g., width of runway). In the 
second condition (C/R5) Ss were trained to 
perform two different responses to approach 
food: climbing in a black apparatus and 
running in a white alley. In both cases one 
S-R sequence was continuously rewarded, 
the other partially, and there were groups 
within each condition counterbalanced for 
the response-reward contingency. Between-S 
‘control’ groups were also included in the 
experiment. The extinction data from this 
experiment show an unmistakable within-S 
PRE, the difference between the conditions 
being that the PRE develops later in Ri/Re 
than in C/R. The between-S controls 
(each group also makes two responses, e.g., 
C+Re+ versus C+Re-+) also show the 
usual PRE, of greater magnitude than that 
found in within-S extinction after within-S 
acquisition. While this experiment does not 
allow us to choose between the primary and 
mediated generalization explanations, it 
does indicate that PRF-like extinction is 
not a necessary characteristic of responses 
acquired under continuous reward condi- 
tions in the within-S experiment. 

Our other approaches to investigating the 
within-S extinction phenomenon have in- 
volved the same apparatus as employed in 
the four experiments presented in this report. 
One idea has been to reduce generalization 
by separating out, into separate trial-time 
blocks, the black and white stimuli: only the 
black alley trials are given at one time of day 
and only white alley trials at another; one 
type of trial is always conducted by one 
experimenter, under specific background- 
noise conditions, while the other type of trial 
is given by a different experimenter in the 
presence of a different background noise. 
We are currently conducting this experiment 
under the conditions that, for example, all 
of the Sı trials (e.g., PRF trials to the 
black stimulus) are run in the morning, and 
all of the S2+ trials (CRF trials to the white 
stimulus) are run 12 hours later, in the 
evening. A procedure such as this is very 
similar to some of the variety of classical 
conditioning experiments on “switching” 
reported by Asratian (1965) which appear 
to be successful in reducing stimulus generali- 


zation. Some of these involve establishing a 
conditioned response at two separate times, 
such as morning-afternoon, to the same 
conditioned stimuli but different uncondi- 
tioned stimuli. In one experiment, to quote 
Asratian, they demonstrated “that it is 
possible to form a positive and a negative 
conditioned reflex of the same kind to a 
single stimulus, i.e., that it is possible to 
switch conditioned reflexes from one func- 
tional sign to the opposite sign within the 
limits of a single type of activity of the 
organism. ..." The same investigator con- 
ducted two experiments with the same dogs 
in the same chamber, one in the morning and 
one in the afternoon, and the difference was 
only that in the morning all of the stimuli 
were reinforced, while in the afternoon one 
of the stimuli was not reinforced. They found 
that this stimulus acted as a positive CS in 
the morning, but like a negative CS in the 
afternoon. These same kinds of experiments 
were also performed with two different 
strengths or delays of UCR, each connected 
to a different conditioning chamber, to a 
different E, or to different times of day, e.g., 
morning-afternoon. The problem in these 
Pavlovian experiments is whether different 
strengths of CR can thus be conditioned to 
“the same CS," that is to say, whether the 
dog can learn different intensities or delays 
of the same conditioned response to the same 
conditioned stimulus when other ‘“back- 
ground" stimuli (the chamber) and/or the 
time of day are varied from one session to 
another. 

A third procedure for investigating the 
source of our within-S extinction finding 
involves a technique of predifferentiation of 
Sı and 8; . We are conducting an experiment 
in which Ss are first trained in a black-white 
discrimination in our apparatus on the basis 
of nonreinforcement to one stimulus (Si—) 
and reinforcement to the other (S++). After 
this differentiation has been learned, the 
stimulus associated with nonreward will be 
switched to partial reward and a prolonged 
period of Sı+S:+ training ensues (a com- 
parison group will be switched to Sı+8:+) 
followed, in turn, by within-S extinction. 
We wonder whether the predifferentiation 
procedure will separate the action of the 
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exteroceptive stimulation sufficiently to 
enable these stimuli to exercise greater 
differential control over extinetion behavior 
and produce the PRE. Such à procedure has, 
in effect, been employed by Stein (1957) in an 
experiment which might be regarded as 
demonstrating the within-S PRE in relation 
to the conditioned emotional response under 
free-responding conditions. 

A fourth approach to a better understand- 
ing of the within-S PRE is perhaps the 
simplest and most straightforward: variation 
in percentage reward to Si in within-S 
PRF acquisition prior to within-S extinction. 
True, a clear within-S PRE has been demon- 
strable in the Rashotte experiment when 
different responses, as well as different 
stimuli, were employed on the 8:50 %- 
8410095 schedule. Nevertheless, failure to 
demonstrate the effect convincingly when 
only one dimension of stimulation (black- 
white) serves as the differential cue may re- 
flect the particular reinforcement percentage 
of Sı employed. In a recent study, Henderson 
(1966) gave within-S acquisition training to 
groups run under a variety of S: percentage 
reinforcement conditions (0, 12, 25, 50, 100) 
holding S; constant across groups at 100 $6. 
The finding was that response speeds to S:# 
varied directly with percentage reward to 
Sit. Unfortunately for our present purposes 
Henderson did not run within-S extinction 
in the second phase of the experiment, but 
instead switched to a variant of discrimina- 
tion reversal (S:+-S2—). We are presently in 
the process of conducting the within-S- 
extinction version of this experiment. The 
Henderson acquisition finding does, inciden- 
tally, suggest that there is a generalization 
of inhibitory factors from Sı to S-F during 
acquisition. 

An experiment by Davenport (1963b) in 
which Ss were trained to run to two stimuli, 
one associated with 100% reward for all 
groups, the other with either 67, 33 or 0%, 
reports free choice data and also speeds 
obtained on forced trials to the two stimuli. 
These speed data of Davenport’s also suggest 
generalization of inhibitory factors from the 
lesser percentage stimulus to the 100% 
stimulus, These findings of Henderson and 
Davenport lend support to the primary- 


generalization-of-rr explanation proposed 
earlier. 

Pavlik and his co-workers have reported a 
variety of within-S extinction findings both 
from free-operant and from runway situa- 
tions. The data from the free-operant experi- 
ments have reflected both a reversed PRE 
(Pavlik & Carlton, 1965), and a conventional 
PRE (Pavlik, Carlton, & Manto, 1965). The 
former result was obtained when one lever 
was employed and time of exposure to 8; and 
S; (red and white arc lights) was equalized; 
the latter result was obtained both when 
number of responses and number of rein- 
forcements to Sı and S; (rather than time of 
exposure) were equalized. In the runway 
experiment (Pavlik, Carlton, & Hughes, 
1965), a reversed PRE was obtained only in à 
goal measure which extended over the last 4 
ft. of a 6 ft. runway, rate of extinction to the 
partial and continuous stimuli being the 
same in the start measure. As we have men- 
tioned earlier the free-operant and discrete- 
trial experiments require different explana- 
tory emphases, and we will not attempt to 
pursue these here. The difference between 
runway results of Pavlik, Carlton, and 
Hughes and our own (as well as those of 
Brown & Logan) may well be due to pro- 
cedural differences noted in their discussion, 
particularly intertrial interval, relatively 
long in our experiments and very short in 
theirs. 

Greater vigor to Sı early in training near 
the goal. A second way in which our within-S 
results look different from our between-S 
results is in relation to performance on early 
acquisition trials. All of the earlier between-S 
results and our own current ones are in 
agreement that, in the goal segment, Ss 
under CRF conditions show greater vigor of 
performance than Ss under PRF conditions 
from the very outset of training, and that 
this difference develops and stabilizes with- 
out reversal. Some of our within-S data 
suggest that Sı+ elicits greater response 
vigor in the goal region early in training than 
does Sa (or S2+), although in most of our 
experiments the relative terminal levels of 
performance to Sı and 8s in the goal region 
correspond to the between-S case, the partial 
stimulus eliciting lesser vigor than the con- 
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tinuous stimulus at that stage. It would be 
unwise to overplay the importance of this 
suggestion from our data; the effect is not 
statistically reliable in Experiments 1, 2, or 4 
and in Experiment 3 it appears to depend 
on Color, as witness, for example, the Color 
Group X Reinforcement interaction in the 
Statistical analysis of the goal data for the 
early part of acquisition. However, this same 
kind of effect seems to occur in several other 
experiments in our laboratory (e.g., Hender- 
son, 1966), and the weight of its appearance 
in à number of studies points to its possible 
reliability. Should the early facilitation of 
responses to Sit prove to be a reliable 
phenomenon, the mechanism of anticipatory 
frustration might explain both this effect and 
the more vigorous performance of partially 
reinforeed responses observed early in the 
response chain late in aequisition training. 
Such an explanation might go something 
like this: in the within-S experiment, rg 
building up to Ss generalizes strongly to 
Sick and creates, much earlier than is pos- 
sible in a between-S PRF group, the condi- 
tions which are necessary for nonreward to 
produce Ry. At this early stage, then, Sit 
and Ss3: evoke reactions which are different 
in this sense: that, to Sit but not to Se, 
as S gets closer to the goal, early in training, 
there is aroused a mild rp reaction, too weak 
to be aversive but strong enough to be 
mildly, and unspecifically, exciting. There 
would be little reason to expect rp to gen- 
eralize to Se at this stage since (a) rp is 
weak, and (b) gradients of rp are steeper than 
are gradients of rg (Amsel, 1962). As training 
continues, the strength of rp increases to the 
point of aversiveness, and responding be- 
comes less vigorous in the goal region to 
Sit than to S24. When ry is strong at the 
goal and generalizes weakly to the run (and 
even to start) segments, it serves as a non- 
specific energizer to produce the acquisition 
crossover in the early segments of the re- 
sponse chain, In the within-S experiment we 
might therefore expect the same kind of 
“crossover”’ near the goal early in training 
as we see in some of the earlier alley segments 
later in training. 

According to this line of reasoning, the 
degree of initial facilitation of responding to 


Sı+ should be a function of the amount of 
rg present early in training when nonrewards 
occur in the presence of Sı . We have tried to 
manipulate the strength of rg by direct goal- 
box placements preceding within-S PRF 
acquisition; but our first attempt has been 
unsuccessful. A recent experiment by 
Trapold and Doren (1966) suggests that a 
replication of our experiment with more 
careful attention to the details of the place- 
ment procedure might be in order. They 
showed that the exact position of placement 
of an S in a goal box made a crucial differ- 
ence; that when, on partial-reward place- 
ment trials, S was placed directly over the 
food cup no PRE was observed on subse- 
quent, extinction running trials, while place- 
ment of S 8 in. from the food cup requiring a 
short approach response, yielded a PRE from 
PR placements. 

PRF acquisition effects within-Ss. The 
within-S acquisition data in our experiments 
are, for the most part, successful replications 
of our earlier acquisition findings (Amsel 
et al., 1964), which have also recently been 
replicated in another laboratory (Ludvigson, 
1966). Attention should be drawn to the fact 
that in Experiments 3 and 4, in which 
acquisition comparisons can be made be- 
tween within-S and between-S curves, it is 
clear that the within-S differences are smaller 
than those found in the between-S condition, 
and that this discrepancy occurs mainly 
because the within-S continuous curve is 
lower than its between-S counterpart. This 
finding suggests generalization of inhibitory 
factors from S,2- to Se during acquisition, 
a finding also confirmed in a later experiment 
(Henderson, 1966). 

PRF acquisition effects between-Ss. Why is 
there no obvious Goodrich-type effect in the 
running or starting data of our between-S 
experiments? It is very surprising that such 
a question would need to be asked. We cer- 
tainly did not expect that the type of cross- 
over found by Goodrich and others in the 
start and run measures of between-S PRF 
experiments would not reappear in our own 
experiments, and we are faced with an 
unusual situation. We started out by doing 
an experiment in which we observed, 
within-S, some of the kinds of phenomena 
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that Goodrich and others had observed in 
between-S experiments. However, when we 
ran the between-S control which we thought 
appropriate for our own type of experiment, 
we could not recover the paradoxieal be- 
tween-S asymptotic crossover effect we had 
taken for granted to begin with. On the 
other hand, the extinction effect, the PRE, 
is clear enough in our between-S experiments 
in all measures but is, to say the least, not 
very clear in the within-S experiments. 

We can think of two possible variables 
which may account for our failure to produce 
crossover effects in between-S acquisition. 
The first and most obvious one is that our 
between-S experiments involved two colors 
(Si-ESs2- versus Sı+8:+) while the earlier 
ones involved only one (Si-+ versus Sick). 
Tt is sheer conjecture to say that the inhibi- 
tory effect of partial reward, which always 
shows up early in the goal area, moves for- 
ward in the instrumental sequence more 
readily when S is exposed (“randomly”) to 
two colors of alley rather than just one. 
This is a possibility without any theoretical 
connotation of moment, unless running to 
two different colors in an alley somehow 
facilitates generalization, particularly of 
inhibitory effects. The second difference 
between our procedure and those of Good- 
rich, Wagner, and others is that while our 
procedure involves no prior feeding in the 
goal box and no preliminary adjustment 
training of any kind, their procedures 
typically involve a substantial amount of 
goal box exposure and feeding. The relevance 
of these variables to the effect in question is 
being tested in our laboratory, but the 
results are, as yet, inconclusive. 


SUMMARY 


In a previous experiment it was reported 
that when an individual S is trained to 
approach two stimuli, one of which (Sit) 
is associated with partial reward and the 
other (S++) with continuous reward, PRF 
Acquisition effects are obtained which have 
many similarities to PRF effects obtained 
in the usual between-S experiments. The 
results of this earlier experiment were that 
the between-S PRF effect, faster running 
by a partial than a continuous group early 


in the response chain and slower running 
by the partial group late in the chain, could 
be reproduced within the same S. This 
experiment was deficient in several respects: 
(a) black was always the partial stimulus 
and white the continuous (B+W+) and 
the reverse color-reinforcement relation- 
ships were not included in the design; (b) 
there were no between-S groups in the 
experiment for direct within- and between-S 
comparisons; (c) rewards, rather than trials, 
to each of the stimuli were equated; and 
(d) extinetion was not carried out and only 
PRF aequisition effects were observed. Our 
present report involves four experiments 
which cover the deficiencies of the earlier 
experiment while they allow further analysis 
of within-S and between-S PRF experi- 
ments. 

Experiment 1, which was in part a replica- 
tion of the earlier experiment, included both 
color groups as well as extinction. It was 
found that the PRF acquisition effects 
reported previously were replicable; how- 
ever, when Ss were extinguished to both S; 
and S; there was no reliable difference in 
performance to the stimuli. There was a 
suggestion that some Ss extinguished more 
rapidly to S2+ than to Sit, the usual 
between-S finding. It was further observed 
that, early in training, Ss performed more 
vigorously to S:-+ than to 82+ late in the 
response chain, a result unlike anything 
found in between-S PRF experiments. There 
was also an indication of a Color X Rein- 
forcement interaction, white interacting 
with partial reward to produce approach 
responses of greatest vigor. 

In Experiment 2 trials- to both stim- 
uli (rather than rewards) were equated 
(Sı+S:+), and both acquisition and extinc- 
tion training were carried out. The results 
indicated that PRF acquisition and extinc- 
tion effects under these conditions were 
similar to those obtained in Experiment 1. 

In Experiments 3 and 4 within-S groups 
(S2-8s2E) as well as between-S groups were 
given acquisition and extinction training 
with trials to both stimuli equated. The 
between-S groups also ran to both stimuli, 
but the continuous group was on a CRF 
schedule to both (S:=+=82+) while the partial 
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group was on a PRF schedule to both 
(SiS2-+). Besides direct within- and be- 
tween-S comparisons, these experiments 
allow a comparison of two extinction proce- 
dures: in Experiment 3 within-S extinction 
followed both within- and between-S ac- 
quisition, that is to say, half of the Ss in 
each group were extinguished only to Sit 
and the other half only to S:#. Comparisons 
of PRF and CRF extinction performance 
are here made under between-S conditions 
after both within-S and between-S acquisi- 
tion. In Experiment 4 extinction was within- 
S in both conditions, all Ss being extin- 
guished to both S; and S; . i 

In both Experiments 3 and 4 the within-S 
groups showed some of the PRF acquisition 
effects observed in earlier experiments. The 
between-S acquisition data, on the other 
hand, did not show the often-reported 
(paradoxical) faster running by the PRF 
group early in the response chain, but rather 
showed slower running in PRF acquisition 
in all segments of the response chain. In 
Experiment 3 in which extinction was be- 
tween-S there was a strong effect of stimulus 
intensity in the within-S groups such that 
extinetion to white produced a relatively 
higher level of responding than did extinc- 
tion to black. Rate of extinction was not 
affected by stimulus intensity, however, and 
the within-S extinction findings of Experi- 


ments 1 and 2 were replicated. In the be- 
tween-Ss groups there was also an effect of 
stimulus intensity, but the usual PRF 
acquisition effect appeared. The within-S 
extinction performance was much more like 
between-S PRF extinction than like CRF 
extinction. 

In Experiment 4, in which extinction was 
within-S, the effect of stimulus intensity 
in extinction was reduced and the same 
pattern of within- and between-S results 
was found as in Experiment 3. A compari- 
son of the extinction performance of within-S 
and between-S groups in Experiment 4 shows 
within-S extinetion following within-S train- 
ing to lie intermediate between the rates 
of PRF and CRF between-S extinction, 
being somewhat more like partial than con- 
tinuous. 

The concluding discussion points up the 
acquisition and extinetion characteristics 
of the within-S experiments which differ 
from those of between-S experiments and 
suggests interpretations of these differences 
in terms of frustrative nonreward mecha- 
nisms. Some ways of investigating further 
the factors responsible for the within-S 
PRF extinction effect are discussed, and 
some possible reasons are advanced for 
failure to obtain the usual between-S ac- 
quisition effects in these experiments. 
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A BLOCK ROTATION TASK: 


THE APPLICATION OF MULTIVARIATE AND DECISION THEORY 
ANALYSIS FOR THE PREDICTION OF 
ORGANIC BRAIN DISORDER* 


PAUL SATZ? 
University of Florida 


A multivariate instrument, designed to detect the likelihood of brain dis- 
order, was standardized and repeatedly cross validated. A block rotation 
test was constructed to measure Ss ability to reproduce block designs as 
they would look if rotated 90 degrees from the stimulus designs. 6 measures 
of error were inserted into the discriminant function along with age and 
WAIS PIQ. The final restandardization was based on 157 brain-injured Ss 
and 210 controls. Predictive validity was high on each of the validation and 
cross-validation samples. Discriminant scores were related to type and 
classification of brain injury, but not to area. The effects of base rates and 
cost efficiency on decisions were examined. 


N the last decade, increasing demands 

have been made on clinical psychologists 
to determine the presence or absence of 
brain lesions in man. Although the success of 
this venture has been disappointing, the 
psychologist has continued to predict the 
likelihood of brain dysfunction from psy- 
chological tests. Several reasons, both 
methodological and theoretical, have been 
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advanced to explain these shortcomings. 
Methodological criticism (Satz, 1963; Yates, 
1954a) has focused on (a) failure to employ 
objective scoring procedures and quantifia- 
ble analysis of data; (b) failure to use ade- 
quate control groups; (c) reluctance to 
report normative data and optimal cut-off 
points for classification; (d) failure to con- 
trol for the relevant variables of age and 
intelligence; (e) tendencies to report the 
discriminatory efficiency of a test in. terms 
of group differences rather than classifica- 
tion accuracy; (f) failure to employ multi- 
variate statistical procedures or cost effi- 
ciency methods to determine the utility of 
the test(s) in different base rate populations; 
and (g) the lack of adequate cross-valida- 
tional studies. 

Additional criticism has been directed at 
the failure to consider possible dimensions 
of the concept “brain disorder.” It has been 
shown that the concept of brain disorder, 
as a homogeneous diagnostic construct, is 
at variance with current teachings in neu- 
rology (Merritt, 1959) and the experimental 
findings in psychology (Meyer, 1961; 
Reitan, 1962). Nevertheless, psychologists 
have often attempted to construct tests 
predictive of “brain damage” without ex- 
amining the effects of possible dimensions 
within the concept. A few examples are 
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types of brain disorder (acute, chronic, 
static), classification of lesions (e.g., vascu- 
lar, neoplastic, traumatic), and localization 
or lateralization of lesions (e.g., left or right 
hemisphere). 


Types and Classification of Brain Disorder 


Neurological evidence indicates that dif- 
ferent types of brain lesions may cause 
marked differences in both symptomatology 
and behavior (Merritt, 1959). According to 
Fitzhugh, Fitzhugh, and Reitan (1961, p. 
61): 


The detrimental effects upon adaptive abilities 
due to acutely destructive lesions such as intrinsic 
tumors or cerebral vascular accidents may be more 
dramatic than the effects of relatively static con- 
ditions such as healed head wounds or slowly pro- 
gressive conditions. 


Indirect reference to this uncontrolled 
variability has been discussed by Yates 
(1954a) in which he urged the use of com- 
parable experimental groups in the replica- 
tion of studies in this area. For example, 
paretics comprised the majority of experi- 
mental subjects (Ss) in the validation of the 
Hunt-Minnesota Test for Organic Brain 
Damage (Hunt, 1943), whereas traumatic 
head injuries were used, for the most part, 
in the replication of this test (Aita, Armi- 
tage, Reitan, & Rabinowitz, 1947). The 
latter study failed to substantiate the find- 
ings reported by Hunt. Paresis generally 
involves a chronic and irreversible condition 
of the brain, whereas the effects of traumatic 
head injury are often transient and reversi- 
ble (Jasper, Kershman, & Elvidge, 1945). 

The only study which has attempted a 
systematic comparison of types of brain 
dysfunction was reported recently by Fitz- 
hugh et al. (1961). The classification con- 
sisted of acute types, in which neurological 
signs were due to a specific, temporally 
defined brain lesion; relatively static types 
which consisted of those patients who had 
either recovered from acute episodes, if any, 
or who exhibited slowly progressive brain 
disease without evidence of sudden onset; 
and chronic-static types which were composed. 
of institutionalized patients having long- 
standing brain dysfunction. Results were 
in the expected direction of greater impair- 


ment for patients who suffered from acute 
organic damage than for patients having 
relatively static damage (e.g., posttraumatic 
concussions, psychomotor epilepsies) or 
chronic-static damage (e.g., convulsive dis- 
orders of the grand mal type). Classifica- 
tion of the independent variables, as such, 
has a twofold effect. It increases the effi- 
ciency of the test(s) by reducing the amount 
of uncontrolled variance, and further in- 
creases our understanding with respect to 
the behavioral correlates of types of brain 
dysfunction. 


Localization and Lateralization of Brain 
Lesions 


In the last decade there has been much 
emphasis on the search for specific and 
localized functions in the brain. The most 
convincing support of specific brain fune- 
tions comes from studies of the language 
process. The left hemisphere has been found 
to govern language functions almost ex- 
clusively (Penfield & Roberts, 1959). Pa- 
tients with left temporal-parietal lesions 
have shown consistent deficits on tests which 
require verbal reasoning and informational 
skills (Dennerll, 1964; Reitan, 1955, 1962; 
Satz, 1966; Weinstein & Teuber, 1957). 
Further, these deficits have occurred in the 
presence or absence of clinically described 
language disorder. The right hemisphere, on 
the other hand, has been shown to have a 
different pattern of functional organization. 
Damage to this hemisphere has been associ- 
ated with perceptual difficulties in manipu- 
lating, ordering, and effecting spatial rela- 
tionships. Such deficits have been measured 
by nonverbal tasks and have been defined 
as visuo-constructive functions (Milner, 
1954, 1962; Reitan, 1955; Teuber, 1962). 

General support for the assumption that 
the cerebral hemispheres mediate differen- 
tial functions could have significant value 
in the design of new tests sensitive to brain 
dysfunction and localization. However, the 
lateralization of visuo-constructive functions 
to the right cerebral cortex has not been 
entirely confirmed. Costa and Vaughan 
(1962), employing a series of complex 
perceptual, motor, and verbal tasks on pa- 
tients with lateralized cerebral lesions, found 
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constructional and perceptual deficits for 
both left and right hemispheric cases, al- 
though maximal impairment was obtained 
for the right brain lesions. Maximal deficit 
for the left brain lesions, however, was in 
the predicted direction of impaired verbal 
performance. This deficit was found ex- 
clusively in patients with left hemispheric 
damage. Similar results were obtained in 
an earlier study by Heilbrun (1956). Em- 
ploying tests of language and tests of non- 
verbal visuo-constructive performance for 
patients with lateralized cerebral lesions, 
he found identical impairments for both 
lateral brain-lesion groups on the nonverbal 
tasks, but demonstrated that verbal deficits 
resulted primarily from left hemispheric 
damage. 

These findings suggest that the organiza- 
tion of verbal processes is strongly lateralized 
in the left cerebral cortex, particularly with 
right handers, whereas the organization of 
nonverbal perceptual and visuo-construc- 
tive skills is probably more diffusely repre- 
sented in both hemispheres, with possibly 
greater representation in the right cerebral 
cortex. This position is consistent with 
physiological data on somesthesis (Semmes, 
Weinstein, Ghent, & Teuber, 1960), and 
more recent findings on the disruption of 
visuo-constructive skills after either left or 
right hemispheric damage (Arrigoni & De 
Renzi, 1964). 


Nonspecific Effects 


There is also evidence to suggest that on 
More complex nonverbal perceptual tasks 
(e.g., embedded figures), deficits will occur 
irrespective of locus of lesions, or laterality 
(Teuber & Weinstein, 1956). The Embedded 
Figures Test (EFT; Teuber, 1959) requires 
the rapid detection of “hidden” figures, 
that is, of complex line drawings concealed 
by embedding them in interlacing contours. 
Deficits on this test transcended the area of 
visual field defects, and revealed similar 
impairments for men with injuries in any 
lobe, or in either or both hemispheres. Ac- 
cording to Teuber (1959) the effects of 
brain lesions in man are twofold—specific 
and general, and vary as a function of the 
type of test(s) employed. Similar findings 


have been observed in Lashley's work with 
the rat. Lashley (1960) demonstrated that 
habits of the simple conditioned reflex type 
were dependent only upon the specific 
sensory areas involved. For more complex 
tasks, however, such as the multiple-stick 
problems, deterioration occurred after sub- 
stantial lesions to any part of the cortex. 

Similar nonspecific effects have been 
found in somesthesis (Teuber, 1959). Again, 
the kind of symptom obtained (i.e., specific 
or general) varied as a function of the kind 
of tests employed. For example, many of the 
simple classical sensory tests yielded rela- 
tively circumscribed deficits, restricted to a 
body region opposite to a unilateral brain 
injury when the central sector of the brain 
was involved. However, performance on a 
complex tactile task (a modified Sequin- 
Goddard formboard test), which was con- 
structed to represent a logical and perceptual 
analogue of the embedded figures task, was 
essentially nonspecific with respect to area 
of damage. Impairment was found for all 
brain-injury groups, irrespective of locus of 
lesion, or laterality. 

These findings (Teuber, 1959) would seem 
to deserve some consideration in attempts 
to design tests diagnostically sensitive to 
brain dysfunction. Selection of a task 
should require some measure of complex 
visual or somesthetie performance similar 
to the “embedded figures" or formboard 
tasks previously discussed. The fact that 
these complex measures were sensitive to 
brain lesions, irrespective of their locus or 
laterality, offers some promise for further 
research in this direction. Apparently under- 
scoring the importance of these complex 
tests is the fact that adequate performance 
depends on a number of different psycho- 
logic functions, each of which is crucial. 
Hence, any lesion sufficient to affect one of 
these functions may lead to a significant | 
general impairment. 


Problem of Test Selection 


The preceding discussion has focused on 
some of the methodological and theoretical 
problems that have often been ignored in 
attempts to construct diagnostic tests of 
brain dysfunction. There are, however, 
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additional problems that should be con- 
sidered—namely the choice of the test itself. 
Although the preceding discussion articu- 
lated some of the relevant independent and 
dependent variables, the problem still 
remains that brain damage very often leads 
to changes which are minimal and elusive, 
and which require very special tasks for 
their discovery (Teuber, 1959). Standard 
intelligence tests have notoriously failed to 
detect deficits in IQ following destruction 
of significant parts of the human brain 
(Hebb, 1949; Meyer, 1961; Weinstein & 
Teuber, 1957; Yates, 19542). A puzzling 
finding from these studies is the apparent 
resiliency of intellectual functions subse- 
quent to brain injury. But is this statement 
true? For normal persons, vocabulary and 
information tests are presumed to be the 
best individual measures of general intelli- 
gence. These tests are further assumed to 
give us the best estimates of a person’s level 
of problem solving outside the laboratory or 
clinic, in a wide range of situations. Accord- 
ing to Hebb (1949, p. 290), the nature of 
the problem is as follows: “How can they 
(the tests) also be the ones that show the 
least effect of brain operation, and the de- 
generative changes of senescence?” 

Hebb reasoned that the apparent resili- 
ency of intellectual functions after brain 
injury was due to the heavy loading of 
stored-information experiences on tests such 
as the Binet and the Army General Classifi- 
cation Test (AGCT). He distinguished be- 
tween two components of intelligence, one 
which refers to an innate capacity for acquisi- 
tion (Intelligence A), and the other which 
refers to the functioning of the brain in 
which development has already occurred 
(Intelligence B). The latter concept involves 
day-to-day experiential functioning already 
based on a history of learning. According 
to Hebb, it is this functioning level that is 
largely measured in psychometric tests of 
intelligence. After behavior has been learned 
or acquired, it then loses its dependency on 
the underlying neural mechanisms and can 
persist in spite of damage to this brain tissue. 
There is, however, a concomitant decrease 
in the likelihood of new learning as a result 
of this tissue injury. This suggests that the 
more sensitive indicator of impairment 


would involve measurement of Intelligence 
A, rather than B. Hebb adopts a phase or 
temporal model for intelligence which pre- 
dicts that these two components of intelli- 
gence are differentially affected in children 
and adults after brain injury. The hypothe- 
sis is advanced that the performance of 
children on intellectual tasks is more fully 
determined by Intelligence A, and for adults, 
by Intelligence B. This hypothesis is con- 
sistent with findings on the irreversibility 
of intellectual impairment in the brain- 
injured child (Hebb, 1949; Strauss & Lehti- 
nen, 1950). The same theoretical viewpoint 
might also explain why so many brain- 
injured adults are able to perform adequately 
on intellectual tasks involving familiar 
problems associated with long-established 
habits. 

Hebb’s position (1949) on intelligence is 
quite similar to the one advanced by Hal- 
stead (1947). Halstead also distinguished 
between two components of intelligence, 
“psychometric” (Intelligence B) and “‘bio- 
logic” (Intelligence A) and reasoned that 
conventional IQ tests are heavily dependent 
on “psychometric” components, that is, 
familiar problems associated with long- 
established habits. Halstead (1947) sug- 
gested that this practice has led to a masking 
of deficits on conventional IQ) tests, which 
otherwise would reflect significant group 
differences due to brain injury. It would 
seem, then, that the desirable features of a 
diagnostic test for brain damage should 
include, at least in part, some measure of 
biologie intelligence (Intelligence A), and 
should require minimal dependence on 
previous experience for the solution of the 
task. It is interesting to note the similarity 
in this position with Goldstein’s (1959) 
hypothesis that the capacity for learning 1$ 
significantly impaired in organic brain dis- 
order. 

The preceding discussion suggests the 
feasibility of employing a complex measure 
of visual or somesthetie performance as 8 
promising dependent variable in the con- 
struction of a diagnostic test for brain dam- 
age (Teuber, 1959). These tests also seem 
to meet the requirements advanced by Hebb 
(1949) and Halstead (1947). They are suffi- 
ciently complex measures which require 
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minimal use of language; they introduce new 
kinds of learning situations and they require 
only minimal use of memory or past ex- 
perience for their solution. Unfortunately, 
none of these complex measures has yet 
provided sufficient discriminatory power for 
diagnostic classification (Teuber, 1959). The 
experimental, rather than practical diag- 
nostic setting in which these tests have been 
used, would perhaps account for this short- 
coming. Second, there has been no attempt 
to evaluate the performance of psychiatric 
patients, particularly schizophrenic Ss, on 
these special tests. Third, performance on 
these tests has been shown to correlate dis- 
proportionately with general intelligence. 
For example, Teuber and Weinstein (1956), 
employing the EFT, obtained a Pearson r 
of .75 with the AGCT, which suggests that 
test complexity has been obtained at the 
expense of an undesirable increase in diffi- 
culty. It also suggests the need to partial 
out the effects of general intelligence on 
complex visual-perceptual measures. 


The Rotation Phenomenon 


In the search for a more efficient psycho- 
logical measure it would be difficult to ignore 
the experimental and clinical literature 
dealing with the phenomenon of rotation. 
This particular visual-motor error tendency 
was first observed on the Goldstein-Scheerer 
Cube Test, which requires S to reproduce 
patterns with 1-inch colored cubes (Gold- 
stein & Scheerer, 1941). Some patients 
(mostly organic) while completing the de- 
signs correctly left the pattern in a rotated 
position, apparently without the awareness 
of the rotation. Such distortions were tilted 
at an angle to the target design, sometimes 
às much as 45 to 90 degrees. This perceptual 
error has subsequently been observed on 
Several different types of visual-motor tasks 
(eg, the Bender-Gestalt, the Graham- 
Kendall Memory-for-Designs Test (MFDT), 
Benton's Memory for Designs, and the 
WAIS Block Design subtest). Rotation 
errors on these tasks have consistently shown 
a positive relationship to brain injury and 
mental retardation (Bender & Teuber, 1948; 
Griffith & Taylor, 1960a; Pascal & Suttell, 
1951). Attempts to use these rotation 


measures as diagnostie aids, however, have 
not produced scorable indices of demonstra- 
ble validity (Chorost, Spivak, & Levine, 
1959). Some of the diffieulty is attributable 
to the infrequent occurrence of rotation 
errors on the stimulus designs used. The 
tests, for example, were not constructed to 
elieit and/or measure this response tendency. 

The most systematic attempts to investi- 
gate this “perceptual anomaly" have been 
made by Shapiro (1951, 1952, 1953). In the 
Block Design Rotation Test (BDRT), which 
was devised by Shapiro, S had to reproduce 
various stimulus designs presented by the 
experimenter (E) which were composed of 
either square or diamond figures placed on 
either square or diamond backgrounds. The 
task was designed specifically to test the 
assumed relation between rotation and 
certain geometric properties of the stimulus 
designs. Rotation was measured as the 
number of degrees by which S’s design 
differed in orientation from that of the 
stimulus design. Shapiro (1953) found that 
brain-injured Ss rotated their reproductions 
significantly more than normal and psychi- 
atric controls. He hypothesized that the 
greater rotation tendencies in brain-damaged 
Ss were due to an increase in cortical inhibi- 
tion caused by trauma which left the patients 
peripherally blind to ground determinants 
of the stimulus. Although Shapiro’s findings 
have been replicated in subsequent studies 
(Williams, Lubin, Gieseking, & Rubenstein, 
1956, 1961), his theoretical interpretation 
has been rejected. Williams et al. (1956, 
1961) experimentally used field reducers in 
order to block out peripheral cues on this 
task and found that the organics signifi- 
cantly reduced their rotation errors under 
this condition while the normals significantly 
increased their rotations. The authors con- 
cluded that whereas the normal S probably 
uses peripheral cues as guides to correct 
orientation of his designs, the brain-injured 
S may be confused and distracted by them. 
This might explain the organie's tendency to 
become “‘stimulus-bound” on complex per- 
ceptual tasks (Goldstein, 1959). The stimu- 
lus boundedness would result from a failure 
to avoid the distracting peripheral cues 
which would consequently reinforce S's 
attention to more concrete attributes of the 
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figure. These conclusions are in agreement 
with those advanced by Strauss and Lehti- 
nen (1950) who state that brain-damaged Ss 
are easily distracted by stimuli, and that 
appropriate retraining methods should in- 
volve gradual reduction of peripheral stimu- 
lation during the relearning process. 


Rationale 


The preceding studies suggest the useful- 
ness of the rotation effect as a diagnostic 
measure of brain damage. The BDRT, pro- 
posed by Shapiro (1953), has several of the 
attributes of a complex perceptual task 
which Teuber and co-workers have found 
sensitive to lesions, irrespective of locus or 
laterality in the brain. Like the EFT, the 
BDRT shares many of the attributes of a 
“biologic” measure of intelligence. Unlike 
the EFT, however, the BDRT has the 
advantage of having been designed in a 
clinical setting as a diagnostic measure for 
brain disorder. The BDRT is not without 
certain limitations, however. First, its 
diagnostic efficiency has never been greater 
than 75% correct classification. This has 
been due primarily to its failure to classify 
correctly Ss who fall in the dull normal and 
lower ranges of intelligence (Williams et 
al., 1956, 1961; Yates, 1954b). Second, the 
BDRT, like most diagnostic tests, was con- 
structed as a single variable instrument in 
which the effects of age and IQ were not 
systematically partialled out. Third, the 
efficiency of the BDRT has not been studied 
under conditions of varying base rate prob- 
abilities for different diagnostic populations. 
A fourth criticism relates to its laborious 
and expensive scoring procedure; each de- 
sign must be photographed after each repro- 
duction, and only later subjected to detailed 
measurement. Finally, it is doubtful that the 
stimulus properties in the BDRT provide the 
optimal conditions for eliciting the rotation 
effect. For example, the mean rotation score 
reported for organics has been small (ap- 
proximately 7 degrees). Even then, logarith- 
mic transformations have been necessary to 
correct for the extreme negative skewness of 
the rotations obtained (Yates, 1954b). 

With the assumption that the rotation 
effect is a potentially useful procedure for 


the detection of organic brain dysfunction, 
the following methods of measurement and 
analysis are proposed for this study: (a) 
construction of a new test of the rotation 
effect, involving multiple measures, in 
which the S is required to rotate a certain 
number of degrees and in a particular direc- 
tion on each of the stimulus problems; (b) 
the application of multivariate discriminant 
analysis for the prediction of discrete cri- 
terion groups with this instrument; and (c) 
the application of base rate and cost effi- 
ciency analyses to determine the utility and 
predietive validity of this multivariate 
instrument. 


METHOD 
Nature of Task 


The psychological measure proposed was a 44- 
item visual-motor task designed for optimal meas- 
urement of the rotation effect. The Block Rotation 
Test (BRT) involved a series of newly constructed 
block designs employing the WAIS blocks (solid 
red and white colors only). Sample designs are 
shown in Figures 1 and 2. The task was composed 
of two parts. Part A (Figure 1) consisted of 15 de- 
signs which were presented in a vertical or hori- 
zontal position to S. Part B (Figure 2) consisted 
of seven designs which were presented at various 
angles to the vertical-horizontal axis. 

The E sat across the table and in front of S, and 
constructed each stimulus design with the blocks. 
The S then reproduced each of the stimulus de- 
signs (given additional blocks) as they would look 
if rotated 90 degrees, either to the left or to the 
right, in randomized order. The S was not allowed 
to turn the stimulus design and was required to 
manipulate only one block at a time. The blocks 
presented had both color (red and white) and de- 
sign properties, but the designs were made simpler 
than those of the WAIS Block Design subtest in 
order to limit the general effect of psychometric 
intelligence. Such a simplification was achieved 
by using only solid colors for any one block and by 
reducing the complexity of the total design. It was 
further assumed that the difficulty of the designs 
might be reduced by eliminating the element of 
symbolic representation (i.e., design cards) and 
restricting the stimulus designs to actual blocks. 

The unique feature of this task is that S was re- 
quired to rotate on every stimulus design. Further- 
more, the task required a different kind of per- 
formance from S. He was not required to reproduce 
block designs as presented in abstract stimulus 
cards; and he found little, if any, opportunity to 
utilize past experience in the solution of the task. 
In short, he was no longer required merely to re- 
produce, recognize, or discriminate various com- 
plex stimulus materials. He was confronted with à 
new test situation in which the solution was not 
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embedded in his stimulus field. In terms of figure- 
ground properties of the task, the S was required 
to shift or alternate constantly between figure and 
ground perceptions. The figure was defined as the 
stimulus design which E constructed and placed in 
front of S. The ground determinants were defined 
as all possible degrees of rotation from the given 
stimulus design or figure. Only one particular 
ground-spatial relationship, however, was correct 
for each stimulus figure, that is, a 90-degree rota- 
tion either to the left or to the right from the 
stimulus (depending on E's instructions). What 
was originally figure (i.e., the stimulus design) had 
to be converted into a new figure which was part 
of the spatial background for that percept. The 
diffieulty arose from the fact that the ground de- 
terminants were neither present, embedded, or 
concealed; and they were not in the S's stimulus 
field. As such, they required the S to resort to 
"inner" inferences, regarding the appropriate 
background rotations, which were assumed to in- 
volve more central brain processes. In other words, 
it was felt that the absence of the background 
visual eues would displace the reliance on sensory 
field information to more centrally determined 
visual-perceptual processes in the brain. 


Administration and Scoring? 


Detailed instructions and scoring method are 
presented in the appendix of the author's disserta- 
tion (Satz, 1963, pp. 178-200). Testing was per- 
formed on a table, approximately 26 inches square. 
The E sat directly across from S throughout the 
testing procedure and constructed each block de- 
Sign one at a time from a stack of 3 X 5 design 
cards which were placed in front of E. There were 
two sets of design cards: one for Part A, which 
consisted of 15 printed cards plus 2 printed exam- 
ple cards, and one set for Part B, which consisted 
of 7 printed design cards plus 1 printed example 
card. Selected designs, for each part, are presented 
in Figures 1 and 2. Each card also depicted the 
order of directional turn for the particular design 
(i.e. left or right). To the right of E was a single 
3 X 5 scoring card which listed by number the 15 
block designs on Part A and the 7 block designs 
on Part B. This recording procedure allowed E to 
analyze errors with respect to individual designs 
for both left- and right-turn rotations on both 
parts of the test. The S’s name, age, and education 
were also recorded on this card. 

Part A of the test was administered first and 
was preceded by two example designs which were 
Dot scored. The first example was a horizontal de- 
Sign composed of two solid blocks (one red and one 
white) which were adjacent to each other. The sec- 
ond design was a vertical design composed of two 
Solid blocks (one red and one white) which were 
adjacent to each other. 
ee 


* All testing on the initial standardization was 
performed by the present author. In the subse- 
quent cross-validation studies, however, all testing 
was performed by other laboratory personnel. 
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Fra. 1. Sample designs, Part A. 


To briefly summarize the procedure, S was first 
shown how the stimulus design (Example A) would 
look if it were turned to the left or to the right 90 
degrees. The E did this by turning the stimulus 
design 90 degrees in both directions. The S was 
then given the opportunity to practice these 90- 
degree quarter turns on Example A. When S was 
able to turn this first example design to a criterion 
of three successes for each direction, he was then 
required to discontinue turning and to attempt to 
reproduce this design (given additional blocks) 
as it would look if turned 90 degrees to the left 
and to the right. When S completed this design 
correctly, for both turns, administration proceeded 
to the second example involving the vertical de- 
sign. The S was requested not to turn this design, 
but to pieture how it would look if turned 90 de- 
grees in the desired direction and then to build 
this visual image. The S was permitted to rotate 
the stimulus design only if he failed to make the 
appropriate rotation. When S completed this de- 
sign correctly, for both turns, testing proper be- 
gan. 

Part B of the test was preceded by only one ex- 
ample design which was made at a 45-degree angle 
to the vertical axis. The design was also composed 
of two solid blocks (one red and one white) which 
were adjacent to each other. Like the first example 
on Part A, S was first shown how the stimulus de- 
sign (Example A) would look if it were turned to 
the left or to the right 90 degrees. The E did this 
by again turning the stimulus design 90 degrees in 
both directions. The S was then given the oppor- 
tunity to practice these quarter turns on Example 
A. The S was told, on this part of the test, that 
each design would end up at an angle due to the 
nature of the 90-degree turn. When S was able to 
turn this example design to a criterion of three 
successes for each direction, he was then required 
to discontinue turning and to attempt to reproduce 
this design (given additional blocks) as it would 
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Fie. 2. Sample designs, Part B. 
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look if turned 90 degrees to the left and to theright. 
When S completed this design correctly, for both 
turns, testing proper began. Both parts of the test 
were scored only for error responses which are dis- 
cussed below. 

Part A errors. Part A consisted of 15 designs 
with either horizontal or vertieal axes. The first 
scorable design was composed of two blocks similar 
to the two example designs on this part of the test. 
Seven of the remaining designs were composed of 
three blocks, and seven were composed of four 
blocks. Each of these designs had to be rotated 90 
degrees in both directions. An error was defined as 
any design made by S which was not a reproduc- 
tion of the stimulus design as it would look if 
turned 90 degrees in the requested direction. It 
was recorded by a checkmark (V) on E's 3 X 5 
Scoring card for the particular design and turns 
under Part A. The maximum number of scorable 
errors (\/) was 30 for this part. 

Part B errors. Part B consisted of seven angular 
designs (i.e., the axes were at a 45-degree angle to 
the vertical). The first scorable design was com- 
posed of two blocks similar to the example design 
for this part of the test. Four of the remaining de- 
signs were composed of three blocks and two were 
composed of four blocks. The only principle of 
symmetry which guided this choice of alternate 
design patterns, for each part of the test, came 
from empirical observation during the pilot stages 
of the study. Each of the designs on Part B were 
also scored for left and right turns. An error was 
again defined as any design made by S which was 
not a reproduction of the stimulus design as it 
would look if turned 90 degrees in the requested 
direction. It was also recorded by a checkmark 
(V) on E's 3 X 5 scoring card for the particular 
design and turn under Part B. The maximum num- 
ber of scorable errors (\/) for this part was 14. 

Total errors. Errors were summed separately for 
Part A and for Part B of the test, and they were 
also combined, giving a maximum possible Total 
error score of 30 + 14 = 44. 

Types of errors were also scored. Under this 
classification there were three distinct variables 
which are listed as follows: 

Duplication errors. The Duplication error was 
defined as any design made by S which merely re- 
produced E's stimulus design. It was recorded by 
the symbol DE on Z's scoring card and was classi- 
fied both as an error and as a type of error. The 
symbol DE, therefore, indicated a twofold classi- 
fication: an error response (\/) and the type of 
error involved. 

Angulation errors. The Angulation error was de- 
fined as any design made by S which was angulated 
or tilted on Part A, or was horizontal or vertical on 
Part B. This type of error score was a logical con- 
sequence of the fact that Part A required horizon- 
tal-vertical designs for the correct rotated posi- 
tion, and Part B required diagonal designs for the 
correct rotated position. These errors were re- 
corded by the symbol AE on E's scoring card, and 
were also classified as an error response (\/) and 
as a type of error. 


Time errors. The Time error was defined as a 
failure to complete the rotated design within a 65- 
second time limit. In the administration of the 
test, E allowed a 5-second interval on all designs 
between giving the request for a left or right turn, 
and placing the blocks before S. The S was then 
allowed 60 seconds to make the appropriate design 
rotation. The procedure of employing a 5-second 
interval, before placing the blocks before S, was 
used in an attempt to provide S with an oppor- 
tunity to visualize the rotated stimulus design 
before manipulating the blocks; it was felt that 
this procedure would reinforce emphasis on the 
perceptual rotation and also reduce trial and error 
behavior. This error was recorded by the symbol 
TE on E's scoring card, and also indicated an error 
response (\/) and the type of error involved. 

There were, in summary, six measures of error 
which were not all independent: (a) Part A errors, 
(b) Part B errors, (c) Total errors, (d) Duplication 
errors, (e) Angulation errors, and (f) Time errors. 
An error, in general, was defined, once again, a8 
any design made by S which failed to reproduce 
the stimulus design as it would look if turned 90 
degrees in the requested direction. This operational. 
definition was both a necessary and sufficient con- 
dition for all errors, regardless of whether they 
were classified by type or not. For types of errors, 
however, this definition was a necessary, but not 
sufficient condition. For example, a design which 
deviated from what the stimulus design would 
look like, if rotated 90 degrees in the appropriate 
direction, was always an error (y), but for this 
design to constitute a particular type of error, it 
had to meet the specified requirements already 
defined for this class of error. Whether it fulfilled 
these requirements or not, it was still an error 
(V). Since it was possible for a person to make ex- 
clusively Duplication errors, or Angulation errors, 
or Time errors on both parts of the test, the maxi- 
mum number of errors possible for each type of 
error was equal to the maximum number of errors” 
obtainable on both Part A (i.e., 30) and Part B 
(i.e., 14). For example, if a person made 10 Duplica- 
tion errors on Part A and 3 Duplication errors on. 
Part B, then his error scores would be classified 
under at least the following four variables: Part 
A errors (N = 10); Part B errors (N = 3); Total 
errors (V = 13), Duplication errors (V = 13). 
The same would be true for Angulation and Time 
errors. 


Additional Test and Nontest Variables 


Along with the six error variables measured by 
the BRT, five additional test and nontest varia- 
bles were further included in this study. The two 
nontest variables were age and education. Age was 
included because of its relationship to both “‘psy- 
chometric” intelligence (Wechsler, 1958) and 
“biologic” intelligence (Reitan, 1956). Educational 
level was included because of its relationship to 
socioeconomic level and general intelligence 
(Griffith & Taylor, 1960b). The three test variables 
were the Abbreviation of the WAIS for Clinical 
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Use (Mogel & Satz, 1963; Satz & Mogel, 1962), the 
Trail Making Test (TMT) (Armitage, 1946; 
Reitan, 1958), and the MFDT (Graham & Kendall, 
1960).4 Only the Performance IQ Scale of the Ab- 
breviated WAIS was administered; this was to in- 
sure a nonverbal psychometrie IQ control for the 
BRT. Pearson r correlations between the Abbrevi- 
ated and original WAIS forms have been reported 
as follows (Satz & Mogel, 1962): Verbal IQ — .99; 
Performance IQ = .97; Full Scale IQ = .99. Cor- 
relation coefficients of this magnitude were found 
regardless of intellectual level or diagnostic classi- 
fication. Subsequent research has confirmed these 
findings (Estes, 1963; Pauker, 1963; Watson, 1966). 
The TMT and MFDT are standardized tests for 
brain damage and were included as a comparative 
means of evaluating the efficiency of the BRT. 


Method of Analysis 


The discriminant function (Fisher, 1936) was 
used in the analysis of the data. This statistical 
technique was devised for the problem of maxi- 
mally differentiating discrete criterion groups 
when multiple measurements are involved. The 
technique is essentially a multiple regression 
problem except for the discontinuous distribution 
on the criterion variables. The expression of this 
function is given by the following linear equation: 


Z = a + do 2 + Ms Be + +++ + Mn an (1) 


in which Z is the composite or compound predictor 
score based on the individual scores on each of the 
variables or tests employed (zi, 22, X, *** 2»); 
and on the respective weights (i.e., lambdas) as- 
signed to each of these scores (ài , Aa, Xs , *** An)- 
The task for a problem involving only two criterion 
groups, when multiple measures are involved, is to 
determine the optimal weights for these variables 
Which will make the difference between the com- 
posite Z measures on both criterion groups as 
large as possible; or, in other words, to find values 
for the lambda coefficients which will maximize the 
difference between the composite means of the two 
a groups involved (Garrett, 1943; Goulden, 


Subjects 


The standardization sample (N = 122) consisted 
of four groups of adult males: normals (N = 20), 
neuroties (N = 20), schizophrenics (N = 23), and 
organics (N = 59). The organic brain disorders 
were selected from two different Veterans Ad- 
ministration Hospitals; one subgroup (V = 33) 
included samples from the Acute-Intensive Treat- 


ment Service at the Veterans Administration 
es 


‘The Trail Making and Memory-for-Designs 
Tests were administered by the present author, 
but the scoring was performed by Walter Lindley, 
a Clinical Psychology Trainee at the Veterans 
Administration Hospital, Lexington, Kentucky. 
Scoring was performed without knowledge of the 
Ss’ diagnosis, age, or Performance IQ. 


Hospital, Lexington, Kentucky; the other sub- 
group (N = 26) was composed of samples from the 
neurological wards at Hines Veterans Administra- 
tion Hospital, Chicago, Illinois. The latter sub- 
group was included in order to provide more repre- 
sentative cases of neurological brain involvement. 
All diagnoses for the organic patients were based 
upon decisions of the hospital medical staffs. These 
decisions were based upon detailed medical his- 
tory, electroencephalography, neurological ex- 
amination, and when further classification was 
needed, angiography and pneumography. 

The organic patients were classified by type of 
brain involvement (N = 59), and by area of brain 
involvement when objectively possible (N = 35). 
These classifications were employed in an attempt 
to account for some of the variability in perform- 
ance among brain-injured patients.® 

Classification by type of brain damage was made 
according to the criteria advanced by Fitzhugh et 
al. (1961, p. 61). One group, acute, was composed 
of patients (V = 23) who had acute neurological 
illnesses and whose neurological symptoms were 
present at the time of psychological testing. These 
patients had experienced a specific temporally 
defined episode, during which their current neuro- 
logical findings had arisen, or had developed a 
rapidly progressive brain disease with steady pro- 
gression of neurological signs. A second group, 
relatively static, was composed of patients (V = 
24) who had either recovered from acute neuro- 
logical signs, or who had slowly progressive brain 
disease without evidence of sudden onset. Among 
this group, the patients with sudden onset of brain 
dysfunction (e.g., brain trauma) had with the 
passage of time recovered from acute neurological 
deficits, suggesting a reorganization of brain func- 
tion and a relatively static condition of the brain. 
The third group, chronic-static, was composed of 
patients (N = 10) with chronic, long-standing 
brain dysfunction. The patients in this group con- 
sisted largely of prefrontal lobectomies and 
chronic epileptics. Diagnoses of the patients 
within each group according to type of brain in- 
volvement are presented in Table 4. 

Classification by area of brain involvement was 
as follows: prefrontal lesions (N = 10), which 
included primarily cases of prefrontal lobectomy; 
left temporoparietal lesions (N = 15), which 
included for the most part cerebral vascular acci- 
dents (CVA) and brain tumors; and right temporo- 
parietal lesions (V = 10), which for the most part 
included CVA and brain tumor cases. 

The neurotic, schizophrenic, and normal Ss 
were all selected from the Veterans Administra- 
tion Hospital, Lexington, Kentucky. The neurotic 
and schizophrenic patients were selected from the 
Acute-Intensive Treatment Service of the hos- 
pital. The normal group was also selected from 


5 In the final restandardization (DF III, 1966) 
the organic patients were also grouped by classifi- 
cation of lesion (e.g., vascular, neoplastic, con- 
vulsive). 
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TABLE 1 


MEANS AND DIFFERENCES BETWEEN CRITERION GROUPS ON THE PREDICTOR VARIABLES 
STANDARDIZATION Group (DF I) 


Nonorganic (N = 63) 


Organic (N = 59) 


Variables 


Mean SD Mean SD 
Part A 3.08 2.56 11.83 7.09 8.95* 
Part B 2.78 1.86 7.78 3.07 10.88* 
Total 5.81 3.85 19.61 9.30 10.58* 
Duplication .52 .98 4.46 4.56 6.49* 
Angulation .22 .52 2.29 3.13 4.99* 
Time .32 .67 :97 1.63 2.84* 
Age 36.22 8.09 39.66 8.74 2.26* 
PIQ 95.98 10.79 91.73 11.07 2.15* 
*ps.05. 
this service and consisted of male psychiatric RESULTS 


nursing assistants. The diagnoses for the neurotic 
and schizophrenic patients were contingent on 
the final decisions of the medical-psychiatric 
staffs. In no case was a neurotic or schizophrenic 
patient included if he had a past history of head 
injury or neurological disease. The same criteria 
applied for selection of the normal Ss. 


Hypotheses 


The following hypotheses were addressed to 
possible dimensions within the concept of brain 
disorder, that is, type of damage, area of damage, 
and classification of disease. 

Hypothesis 1. Changes in test behavior after 
cerebral damage will vary as a function of the 
type of damage involved. In line with Fitzhugh 
et al. (1961), acute types of brain damage will re- 
veal more significant impairment in test perform- 
ance than either relatively-static or chronic-static 
brain-injury cases. Discriminant function com- 
posite scores were used in this analysis. 

Hypothesis 2. Changes in test performance after 
cerebral damage will show nonspecific and gen- 
eralized effects on a test of complex perceptual 
functioning such as the BRT. Specifically the 
hypothesis predicts that performance will be un- 
related to either locus or laterality of brain in- 
volvement, due to the complex nature of the task 
employed. This hypothesis (Teuber, 1959) is in 
contrast to the alternative hypothesis that would 
predict maximal impairment, on visuo-construc- 
tive tasks, for right hemispheric damage (Milner, 
1962; Reitan, 1962). Discriminant function com- 
posite scores were used in this analysis. 

Hypothesis 3. Changes in test behavior will 
vary as a function of classification of disease. This 
hypothesis is addressed to the possible differential 
effects between structural lesions (e.g., neoplastic 
and vascular conditions) and nonstructural lesions 
(e.g., convulsive disorders and toxic drug condi- 
tions). Discriminant function scores, based on the 
final restandardization group (DF III), were used 
in this analysis. 


Table 1 presents the mean differences 
between the combined control (Group I) and 
organics (Group II) on the eight predictor 
variables which showed the best discrimina- 
tion between criterion groups. The varia- 
bles included the six BRT variables along 
with Age and Performance IQ. Inspection 
of this table reveals significant differences 
between criterion groups on each variable. 
The differences were in the direction of 
higher scores (i.e., errors) for the organics 
on all but the Performance IQ variable. 
Performance IQ was lower for the brain- 
injured group (p « .05). The three variables 
which were excluded from further analysis 
were the TMT, MFDT, and education. 
Although the TMT correctly classified 85 % 
of the brain-injured Ss (Group II), it also 
misclassified 73% of the combined controls 
(Group I). The MFDT, on the other hand, 
correctly classified only 49% of the brain- 
injured Ss (Group II) and misclassified 42 % 
of the psychiatric and normal controls 
(Group I). Education also showed a high 
overall rate of  misclassification (valid 
positive = 46%; false negatives = 40%). 
In fact, the mean and standard deviation 
scores for education were quite similar for 


5 The reader is referred to the author's doctoral 
dissertation (Satz, 1963) for detailed information 
concerning the statistical selection of predictor 
variables, and the reasons for combining the 
normal, neurotic, and schizophrenic groups. The 
decision to combine the three nonorganic groups 
into one criterion group was based essentially 
on the failure to differentiate these groups on the 
predictor variables. 
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TABLE 2 
INTERCORRELATIONS FOR TOTAL SAMPLE* OF EIGHT PREDICTOR VARIABLES 
R 2 3 4 5 6 7 p 
Variable Part B Total Duplication Angulation Time Performance 
error error error error error Age IQ 
1. Part A error -769 973 -740 -378 .059 .921 —.416 
2. Part B error .896 .602 .499 -067 +293 — 854 
3. Total error .T31 .443 .065 .929 — AIT 
4, Duplication error .195 —.008 .305 — .090 
5. Angulation error .008 .158 —.259 
6. Time error .183 —.137 
T. Age .031 


Note.— Correlation coefficients of .178 and .233 are significant at the .05 and .01 levels, respectively. 
a N = 122 (20 normals, 20 neuroties, 23 schizophrenics, and 59 organics). 


each of the criterion groups (Combined 
controls, X = 10.17, SD = 2.77; Organics, 
X = 10.31, SD = 3.20). 

Table 2 shows the Pearson product- 
moment correlation coefficients for the 
eight variables on the initial standardization 
sample (N = 122). Although there were a 
number of significant correlations (n = 19), 
only a few were large enough to account for 
a sizable amount of the variance. The high- 
est correlation obtained was between Part A 
and Total errors (r = .97); the inflation of 
this correlation coefficient was due in part 
to the confounding effects of Part A which 
was included in the summation of Total 
errors. The relationship of Age and Per- 
formance IQ with the other measures in this 
correlation matrix was of particular interest. 
Both variables correlated significantly with 
the three best individual predictors on the 
BRT. Performance IQ was inversely related 
to Part A errors (r = —.42), Part B errors 
(r — —.35) and Total errors (r — —.42); 
Age was positively related to Part A errors 
(r = .32), Part B errors (r = .29) and Total 
errors (r — .33). 


The Discriminant Analysis (Discriminant 
Function I, 1963) 


The discriminant function analysis was 
performed on the eight variables presented 
in Tables 1 and 2. The criterion groups were 
classified dichotomously, that is, nonorganic 
(Group I) or organic (Group II). By com- 
puting the within-group sums of squares 
and cross products for all combinations of 
eight variables, a set of eight simultaneous 


equations was obtained. The solution of 
these equations yielded the following lambda 
values for each variable: Part A (Ai 
—5.3141), Part B (à; 3.5168), Total 
(às = 12.4600), Duplication (A, = — 1.000), 
Angulation (As 7.000), Time (As 
25.3278), Age (M. = —2.0666), and Per- 
formance IQ (As —* 3.9282). 

The data were next analyzed to determine 
the mean composite discriminant score for 
each criterion group, where the organic 
group was defined as Population A and the 
nonorganie group, Population B. For ex- 
ample, the mean composite discriminant 
score for the organie group would have the 
following expression: Za = Xa, + 
AX aa + ++ + AsXa, , in which Z, repre- 
sents the composite discriminant score based 
on the mean scores of each variable for the 
organics (Xa,, ::* Xas), and the corre- 
sponding weights assigned to each of the 
respective variables (àr, ::- As). The re- 
sults were as follows: Z4 = 523.24, Zs = 
376.86. 

In order to reach a decision on any in- 
dividual’s composite score, the following 
strategy was adopted to determine the 
optimal cut-off score: 


Za + Zn 
2 ? 


If Z; = 


then predict Population A (organic). 


Za + Zn 
2 HJ 


then predict Population B (nonorganic). 


If Z; « 


7The data were originally computed on a 
Monroe calculator and later reanalyzed on an 
IBM 709 computer at the University of Kentucky 
Computing Center. 
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TABLE 3 


PREDICTIVE CLASSIFICATIONS BY UsE OF 
DISCRIMINANT Fonction Ie 


Interval " 
ts unge Neurotics 
composite ae = irene 
scores 20 N man nS 


675-699 
650-674 
625-649 
600-624 
575-599 
550-574 
525-549 
500-524 1 
475-499 
450-474 


m 


425-449 
400-424 
375-399 
350-374 
325-349 
800-324 


RRR wo NNR OAnDOaNNe 


mom IR or co o 
COO NTO ee 
WH gt Orco 


* Composite cut-off score: Z = 450.05; overall 
hits = 89%, valid positives = 81%, false posi- 
tives = 3%. 


The ratio, (Za + Z5)/2, yielded the com- 
posite value of Z = 450.05 as the optimal 
cut-off score for this linear prediction equa- 
tion. 

The discriminant function was next 
analyzed to test the difference between the 
composite Z scores for the two criterion 
groups. This analysis of variance is essen- 
tially a test of significance of the discrim- 
inant function. The results showed signifi- 
cant differen tiation between criterion groups 


on the composite Z scores (F = 29.79, p < 
.001). 

Predictions were then made on each in- 
dividual in the study. This was done by 
computing the composite discriminant score 
(Z) for each S and predicting “organic” if 
his composite score was Z = 450.05, and 
“nonorganic” if his composite score was 
Z < 450.05. The results of this decision 
policy are presented in Table 3. Inspection 
of this table reveals that only two nonorganic 
Ss were misclassified, giving a false positive 
rate of p» = .03; the misclassification rate 
for the organic group, however, was larger, 
and yielded a valid positive rate of only 

= .81. In spite of the failure to detect 
several brain-injured cases, the discriminant 
function still correctly classified 89% of the 
total standardization sample (V = 122). 


Analysis of Types of Brain Damage 

Table 4 presents the diagnostic distribu- 
tions within each of the three types of brain- 
lesion groups. The results of this analysis 
are reported in Table 5. The composite 
predictor means of the respective groups 
were as follows: acute (Z = 551.77), relatively 
static (Z = 489.02), and chronic-static (Z = 
532.78). It is interesting to note that each 
of these composite Z scores was above the 
minimal score critical for organic brain dis- 
order (Z = 450.05), although the relatively 
static brain-lesion group did tend to con- 
verge closer to this cutting line. The analysis 
of variance of these composite predictor 


TABLE 4- 
DISTRIBUTION OF SUBJECTS ACCORDING TO TYPE or BRAIN DAMAGE 
Acute (N = 23) Relatively static (V = 24) Chronic-static (V = 12) 
Cerebral vascular accident 13) CBS,* brain trauma 10| Bilateral frontal lobectomy 6 
ABS, drug intoxication 1| Post-traumatic concussion 1| Unilateral frontal lobotomy 1 
is pneumococcal meningi- 1| Cerebral arteriosclerosis 2! Convulsive disorder, grand- 14 
is » mal 
CBS, brain tumor 6| Psychomotor epilepsy 2| Psychomotor epilepsy 1 
Huntington's Chorea 1| CBS, convulsive disorder 6 
Encephalitis 1| CNS lues, meningoence- 1 
phalitie 
CBS, drug intoxication 1 
Parkinson's disease H 


* CBS — Chronie Brain Syndrome. 
b ABS = Acute Brain Syndrome. 
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TABLE 5 


COMPOSITE PREDICTOR MEANS FROM Discriminant FUNCTION I By TYPE or 
BRAIN INVOLVEMENT 


Type of damage N Composite mean 
Acute 23 551.77 
Relatively static 24 489.02 
Chronic-static 12 532.78 


Analysis of variance of composite predictor score 


— 


Source of variation ss df MS F ? 
Between groups 113.00 2 56.50 4.15 <.05 
Within groups 763.00 56 13.63 
"Total 876.00 
Differences between group composite means 
Relatively static Chronic-static 
Acute —62.75** —18.99 
Relatively static 43.76 


pss OL. 


scores (Table 5) revealed an overall separa- 
tion between groups (F = 4.15, p < .05). 
The only difference between group means 
occurred between the acute and relatively 
static brain lesions (¢ = 2.83, p < .01), 
with the acute lesions showing greater 
impairment. There was, however, no sig- 
nifieant difference between the acute and 
chronic-statie groups, although the trend 
was in the direction predicted. Further 
analysis revealed that 11 of the 12 Ss within 
the chronic-static group (91%) were cor- 
rectly classified as organic. In view of past 
difficulties in detecting cases of prefrontal 
damage, the latter finding is of some in- 
terest. The hit rate (Hr) for the acute group 
was also 91%, with 21 of the 23 Ss being 
correctly classified. It is impossible to com- 
pare these repsective hit rates with Fitzhugh 
et al. (1961), because this information was 
Dot reported in their study. 

An unexpected finding was the fact that 
8 of the 11 diagnostic misclassifications for 
the total organic sample (ie. 73%) were 
represented within the relatively static 
group. Apparently the higher false negative 
rate (1 — p, = .19) for the discriminant 
function was due primarily to the inclusion 
of relatively static brain-lesion cases, which 


in turn reduced the valid positive rate 
(pı = .81). In spite of this limitation, how- 
ever, the discriminant function still classified 
correctly 67 % of the relatively static cases. 

The results, in summary, did support the 
general hypothesis that the type of brain 
lesion is a meaningful variable within the 
concept of "brain damage." The specific 
hypothesis relating to the direction of differ- 
ences between types, however, was only 
partially confirmed. 


Analysis of Generalized Effects 


This analysis was performed on 35 brain- 
injured Ss who were classified according to 
locus and laterality of brain involvement. 
The results are reported in Table 6. The 
composite predietor means for each group 
were as follows: left temporoparietal damage 
(Z — 546.87), right temporoparietal damage 
(Z = 543.30), and frontal lobe damage 
(Z = 534.70). The mean composite Z scores 
for each group were well above the critical 
cutting line for organic brain disorder zz 
450.05). Analysis of variance on the com- 
posite scores, however, failed to show any 
significant overall difference between groups. 
This fnding therefore supports the hy- 
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TABLE 6 


COMPOSITE PREDICTOR MEANS FROM DISCRIMINANT FUNCTION I BY AREA OF 
BRAIN INVOLVEMENT 


Area of damage N Composite mean 
Left temporo-parietal 15 546.87 
Right temporo-parietal 10 543.30 
Frontal 10 534.70 
Analysis of variance of composite predictor scores 
Source of variation ss df MS F ? 
Between groups 900.24 2 450.12 .0975 ns 
Within groups 147799.93 32 4618.75 
"Total 148700.17 


pothesis of nonspecific effects with this test 
instrument. 


T'est-Retest Reliability 


In order to evaluate the reliability of the 
test, the BRT was readministered to a ran- 
dom sample of Ss (V = 18) from the original 
standardization group (N = 122). The 
Performance IQ Scale of the WAIS (Satz & 
Mogel, 1962) was also readministered to 
this sample. The problem was analyzed in 
two ways: (a) in terms of the Pearson prod- 
uct-moment correlations on each test varia- 
ble between testings and (b) in terms of the 
classification changes on the composite Z 
scores after retesting. The retest composite 
Z scores were computed to provide some 
general measure of classification reliability. 
Roughly 4 weeks intervened between the 
test and retest sessions for each S. 

The Pearson correlation coefficients for 
each variable were as follows: Part A errors, 
T = .89; Part B errors, r = .85; Total errors, 
r = .91; Duplication errors, r = .81; Angula- 
tion errors, r = .75; Time errors, r = .67; 
and Performance IQ, r = .89. The correla- 
tion coefficients between test and retest 
sessions were significant for each of the 
variables (p < .01), although three of the 
variables (Duplication, Angulation, and 
Time errors) were more subject to change 
during this test-retest interval. With regard 
to classification changes on the composite 
Z scores, only one S showed a change in 
predictive classification after retesting. This 
involved a schizophrenic patient who was 
originally misclassified on the first testing, 


but who was correctly classified on retesting. 
All other Ss were predicted within the same 
criterion group on both testings, regardless 
of diagnosis. These results, in summary, 
suggest adequate demonstration of reliability 
for the individual variables and the com- 
posite predictions. 


Cross-Validation Studies (1964-1966) 


The following studies were undertaken to 
examine the predictive validity of the BRT 
in a new diagnostic population, and to con- 
trol for the possible examiner bias in the 
original standardization (DF I, 1963). The 
studies are as follows: (a) cross validation of 
DF I on a new sample of brain-lesion and 
control Ss (1964); (b) restandardization 
(DF II) based on the original standardiza- 
tion group (DF I) and the 1964 cross-valida- 
tion sample; (c) cross validation of the 
DF II on an additional sample of brain- 
lesion and control Ss (1965); and (d) re- 
standardization (DF III) based on the 
second standardization groups (DF II) and 
the 1965 cross-validation sample (1966). 

The purpose of these separate validational 
analyses was twofold: (a) to obtain greater 
representation in the criterion groups in 
order to stabilize the lambda weightings 
and (b) to determine when adequate predic- 
tive validity had been demonstrated. 

The Ss were all selected from the Inpatient 
and Outpatient Services of the Teaching 
Hospital at the University of Florida Col- 
lege of Medicine, Gainesville, Florida. All Ss, 
from both criterion groups, were given 
thorough neurological evaluations by the 
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staff of the Department of Neurology. Sev- 
eral of the neurological patients were also 
evaluated by the Division of Neurosurgery. 
Classification of Ss to the organic criterion 
group was again based upon detailed medical 
history, neurological examination, electro- 
encephalography, brain scans, skull films, 
and when necessary, arteriography, angiog- 
raphy, and pneumography. Classification of 
Ss to the nonorganic criterion group was 
based upon at least a negative medical 
history and neurological examination, and 
frequently upon negative EEG and skull- 
film reports. In roughly 25% of the cases, 
angiography and/or pneumography was 
carried out due to the nature of the present- 
ing symptoms. 

The nonorganic Ss were selected from this 
population in order to provide a more real- 
istic approximation of the typical inpatient 
medical setting in which patients are likely 
to present a wide variety of “apparent” 
neurological disorders. This selection pro- 
cedure also provided a more detailed exam- 
ination of the Ss assigned to the nonorganic 
criterion group. Detailed compositions of 
both groups are presented in the final re- 
standardization (DF III). 

The BRT and WAIS were administered 
by the staff and trainees of the Clinical 
Neuropsychology Laboratory, with the ex- 
ception of the present author.’ All testing 
was done “blind” without reference to 
hospital or referral charts. This procedure 
is routinely followed in psychological evalu- 
ations in the laboratory. 

Discriminant function. I: Cross validation. 
This investigation was made on 100 con- 
secutive referrals to the laboratory during 
1964, The sample consisted of 48 brain-lesion 
cases (Group I) and 52 psychiatric, general 
medical, and normal Ss (Group II). Group I 
was represented equally by vascular, neo- 
plastic, traumatic, and convulsive disorders. 
Group II was largely represented by psychi- 
atric and general medical cases. Classifica- 
tion was made on the basis of the lambda 


* The author is deeply grateful to Eileen Fen- 
nell, Research Assistant, who was responsible for 
Supervising the administration of the BRT to 
predoctoral students and who handled the detailed 
follow-up classification of patients during the cross- 
validation analyses. 


coefficients derived from DF I and each 
S’s score on the eight predictor variables in 
the present sample. The original cutting line 
of Z = 450.05 was again used. These results 
are presented in Table 7. Inspection of this 
table shows that although the instrument 
correctly classified 75% of the total sample 
(N = 100), there was a 14% shrinkage in 
overall hits compared to the original stand- 
arization (DF I). This is not too surprising 
when one considers the fact that a new popu- 
lation was sampled, different examiners were 
used, and further, that the original stand- 
ardization was based on a relatively small 
number of Ss (V = 122). The main source 
of error classification occurred with Group 
II in which 29% of the nonorganics were 
falsely classified. This represents a 26% in- 
crease in the false positive rate over the 
original standardization (DF I). Of greater 
interest, however, is the striking similarity 
in the valid positive rate between the two 
studies. The original standardization func- 
tion correctly classified 81 % of the organics, 
and 79% of the organics in the cross-valida- 
tion sample. Further, 5 of the 10 false nega- 
tive errors were convulsive disorders in 
which the only criterion evidence was EEG 


TABLE 7 
Cross VALIDATION or DISCRIMINANT FUNCTION 
I (1964 SAwPLE)* (N = 100) 


Organics 


Nonorganics 
(N = 48) 


Interval composite scores (N = 52) 


a 


NOORHEENNHEE 


675-699 
650-674 
625-649 1 
600-624 
575-599 1 
550-574 
525-549 2 
500-524 4 
475-499 4 
450-474 3 


425-449 15 
400-424 5 
375-399 6 
9 
1 
J 


B2 O2 b) BO 


350-374 
325-349 
300-324 


* Composite cut-off score: Z 2 450.05; overall 
hits = 75%, valid positives = 79%, false posi- 
tives = 29%. 
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TABLE 8 
MEANS AND DIFFERENCES BETWEEN CRITERION 
GROUPS ON THE PREDICTOR VARIABLES 
RESTANDARDIZATION GROUP 


(DF II) 
Variables | Nonorsanics = Ea t 
Part A 3.52 11.27 9.26* 
Part B 2.99 7.40 9.55* 
Total 6.51 18.67 9.50* 
Duplication -66 3.64 4.11* 
Angulation .62 2.91 3.42* 
Time .98 1.45 2.40* 
Age 34.52 41.63 2.08* 
Performance 99.05 94.80 2.18* 
IQ 


* All ¢ values significant at p € .05. 


abnormality and a history of fits, that is, 
no structural damage was evidenced. 

On the basis of these findings it was de- 
cided to combine the data of this sample 
with the original standardization sample 
and to compute a new discriminant function 
based on more representative cases. 

Discriminant function II: Restandardiza- 
tion (1964). Table 8 presents the means and 
differences between the organic (N = 105) 
and nonorganie (N = 117) groups on each 
of the discriminant predictor variables for 
the original standardization group (DF I) 
and the 1964 cross-validation sample com- 
bined.’ Inspection of this table reveals sig- 
nificant group differences on each variable. 
Although the neurological group was both 
older (¢ = 2.08, p < .05) and less intelligent 
(t = 2.18, p < .05), their mean level of in- 
telligence was within the Normal Range 
(X — 94.80) and their mean age was not 
high (X — 41.63). In other words, Group I 
was not heavily composed of deteriorated 
organie patients which, if so, would tend 
to spuriously inflate the valid positive rate. 

The diseriminant function analysis was 
computed on an IBM 709 Computer with 
the use of the UCLA BIMED program 


P Two Ss who were originally diagnosed as 
epileptic were reclassified as nonorganie on the 
basis of additional information provided by the 
Neurology staff. The author extends his apprecia- 
tion to Richard Weaver (Department of Neurol- 
ogy) and Lamar Roberts (Chief, Division of 
Neurosurgery) for their help in reevaluating some 
of the equivocal neurological cases. 


(No. 005).'? Test of significance of the two 
group mean composites was significant 
(F — 20.99, p « .001) suggesting good sep- 
aration between criterion groups on the 
composites scores. The following lambda 
coefficients were obtained: Part A (A; = 
3.5355), Part B (As = 3.6545), Total (dy = 
— 2.3610), Duplication (A, = .4889), Angu- 
lation (A; = .7204), Time (A, = 1.3166), 
Age (Ar = —.0286), and PIQ (As = .2080). 
The derived composite cut-off score was 
Z = 35.92. 

Table 9 presents the classification of Ss 
on the basis of the new weightings (DF II). 
The table reveals that Discriminant Func- 
tion II correctly classified 82% of the total 
sample (V = 222), with a false positive 
rate of 12 % and a valid positive rate of 76%. 
By obtaining greater representation in both 
criterion groups, the high false positive rate 
on DF I (cross validation) decreased with- 
out appreciably altering the high valid 
positive rate in both studies. Further anal- 
ysis again revealed that epileptic disorders 
accounted for the majority of the false 
negative errors. 

Discriminant function II: Cross validation. 
The purpose of this investigation was to de- 
termine the predictive validity of DF II on 
a new sample of brain lesion and control Ss. 
The study was carried out on 151 consecu- 
tive referrals to the laboratory during 1965, 
and consisted of 61 brain-lesion cases (Group 
I) and 90 psychiatrie, general medical, and 
normal controls (Group II). Group I was 
represented equally by vascular, neoplastic, 
traumatie, and convulsive disorders. Group 
II was largely represented by psychiatric 
and general medical cases. Classification was 
made on the basis of the lambda coefficients 
derived from Discriminant Function II. The 
same composite cut-off score was again 
employed (Z z 35.92). The results are pre- 
sented in Table 10 and show that roughly 
79% of the Ss were correctly classified by 
this function (DF II). This second cross 
validation (DF II) showed considerably less 
shrinkage (3 %) than the initial cross valida- 
tion on DF I (14%). The results also showed 
a false positive rate of 16.67% and a valid 


` 10 The computer analyses were run by the staff 
of the Computing Center, University of Florida. 
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TABLE 9 
PREDICTIVE CLASSIFICATIONS BY USE OF 
DISCRIMINANT Fonction II (1963, 
1964 SAMPLES) RESTANDARDIZA- 


tion (N = 222) 
= 
pua aede ood QE) 

76-79 1 
72-15 4 
68-71 6 
64-67 9 
60-63 3 
56-59 1 4 
52-55 7 
48-51 2 14 
44-47 3 12 
40-43 2 11 
36-39 6 9 
32-35 8 6 
28-31 33 9 
24-27 34 7 
20-23 25 3 
16-19 3 


Note.—Composite cut-off score: Z 2 35.92; 
overall hits = 82%, valid positives = 76%, false 
positives = 12%. 


positive rate of 72.13 %. With respect to the 
false positive errors, the present cross valida- 
tion revealed a much smaller increase in 
errors (4%) than occurred with the original 
cross validation (29%). In summary, Dis- 
criminant’ Function II demonstrated ade- 
quate predictive validity on a new sample 
of brain-lesion control Ss, with only a slight 
increase in false positive and false negative 
rates of misclassification. 

On the basis of these findings it was de- 
cided to combine the data of this cross- 
validated sample (V = 151) with the data 
on Discriminant Function II (N = 222), 
and to compute a new and final discriminant 
function based on even larger representative 
cases, 

Discriminant function HI: Restandardiza- 
tion (1966). Table 11 presents the means 
and differences between the organic (V = 
157) and nonorganie (N = 210) criterion 
groups on each of the discriminant predictor 
variables for the combined samples (1963- 
1966). Inspection of this table reveals 
Significant, group differences on each vari- 

4 Three Ss who were originally diagnosed as 


epileptic were reclassified as nonorganic on the 
basis of additional information provided by the 


TABLE 10 


Cross VALIDATION OF DISCRIMINANT 
Fonction II (1965 SAMPLE) 


Organics 


Interval composite 
scores (N = 61) 


Nonorganics 
(N = 90) 


80.92-85.91 
75.92-80.91 
70.92-75.91 
65.92-70.91 1 
60.92-65.91 
55.92-60.91 1 
50.92-55.91 1 
45.92-50.91 1 
40.92-45.91 6 
35.92-40.91 5 


E 
BOO Pb OUO b BO OO AHH 


30.92-35.91 24 
25.92-30.91 30 
20.92-25.91 17 
15.92-20.91 4 


Note—Composite cut-off score: Z = 35.92; 
overall hits = 79%, valid positives = 72%, false 
positives = 17%. 


able. Although: the neurological group was 
both older (¢ = 2.05, p < .05) and less 
intelligent (¢ = 2.15, p < .05) their mean 
level of intelligence was within the Normal 


TABLE 11 
MEANS AND DIFFERENCES BETWEEN CRITERION 
GROUPS ON THE PREDICTOR VARIABLES 
RESTANDARDIZATION GROUP 


(DF III) 

Variables Uy EE) d 
Part A 3.67 11.76 9.56* 
Part B. 3.14 7.73 10.66* 
Total 6.81 19.50 10.35* 
Duplication .53 3.11 4.48* 
Angulation .94 2.05 2.91* 
Time .64 1.43 2.23* 
Age 37.00 41.06 2.05* 
Performance 101.20 94.65 2.15* 

IQ 


* All t values significant at p S .05. 


Range (X = 94.65) and their mean age 
was not high (X = 41.66). 
The discriminant function analysis was 


Neurology staff. The Neurology Service also 
recommended that six additional patients from 
the organic group be dropped temporarily from 
this study for lack of definitive neurological 
evaluation. These Ss were placed in a brain tumor 
suspect group for further outpatient work-up. 
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TABLE 12 
PREDICTIVE CLASSIFICATIONS BY USE OF 
Discriminant FuNcTION III (1963- 
1966) RESTANDARDIZATION 


(N = 367) 
kc uw cr dE AES 
28.15-29.84 2 
26.46-28.14 0 
24.77-26.45 2 
23.08-24.76 5 
21.39-23.07 12 
19.70-21.38 17 
18.01-19.69 2 7 
16.32-18.00 3 12 
14.63-16.31 4 15 
12.94-14.62 7 23 
11.25-12.93 6 14 
9.56-11.24 21 16 
7.87-9.55 43 10 
6.18-7.86 64 12 
4.49-6.17 56 8 
2.80-4.48 4 2 


Note.—Composite cut-off score: Z = 11.25; 
overall hits = 81%, valid positives = 70%, false 
positives = 10%. 


computed on an IBM 709 Computer with 
the use of the UCLA BIMED program 
(No. 005). Test of significance of the two 
group mean composites was significant (F = 
26.52, p < .001), suggesting good separation 
between criterion groups on the composite 
Z scores. The following lambda coefficients 
were obtained;? Part A (Mı = .9160), Part 
B (àz = 1.3824), Total (às = —.6601), 
Duplication (A, = .2991), Angulation (às = 
9270), Time (às = .4415), Age (Ar = 
—.0289), and PIQ (As = .0500). The derived 
composite cut-off score was Z > 11.25.13 


? The magnitude and direction of the lambda 
coefficients showed much variation between the 
separate discriminant function analyses (DF 
I-III). Most of the variation, however, occurred 
between DF I and DF II and was probably due to 
the lack of stability of the lambda estimates 
based on the smaller N in the standardization 
sample (DF I). When each of the coefficients 
(DF I-III) were converted to standard scores 
(separate analysis) it was found that the relative 
contributions of each variable remained essen- 
tially the same between DF II and DF III. The 
sample size for both of these standardization 
analyses was also appreciably higher. 

13 The three composite cut-off scores (DF I-III) 
varied primarily as a function of the way in which 


Table 12 presents the classification of Ss 
on the basis of the new weightings (DF III). 
Inspection of this table reveals that 81% 
of the total restandardization group (V = 
367) were correctly classified by this new 
predictor function, with a false positive rate 
of 10% and a valid positive rate of 70%. 
In other words, 90% of the nonorganic con- 
trols and 70 % of the organics were correctly 
classified by the restandardization function. 
These findings, in summary, satisfy the 
initial requirements concerning the predic- 
tive validity of this instrument. 


Base-Rate Considerations 


The purpose of the following analyses was 
to determine the utility or efficiency of the 
BRT (DF III) in settings in which extreme 
base-rate asymmetry might seriously affect 
the predictive validity of the instrument. 

The first analysis was addressed to the 
problem of determining the probability of 
correct classification, giving a positive com- 
posite Z score on the BRT, under various 
theoretical base-rate populations. In more 
practical terms, the question asked was as 
follows: *How sure can you be on the basis 
of a positive test sign?" This problem in- 
volved the application of inverse probability 
given by Bayes’ formula (Meehl & Rosen, 
1955): 


er LN 2 
A P-p + Q: po (2) 


P, = Probability that an individual is 
organie, given that his test score is 


positive. 

P = Base rate of organics in the popula- 
tion. 

Q — Base rate of nonorganies in the 
population. 


pi = Proportion of organics identified by 
test (“valid positive” rate). 


the lambda coefficients were converted from the 
computer program. The lambda coefficients for 
DF II and DF III were each multiplied by a con- 
stant of 1000 to avoid excessive use of decimals. 
The coefficients for DF I, however, were each 
multiplied by 1000 and then divided by the lowest 
lambda coefficient to avoid any values less than 
unity. This conversion accounted for the higher 
composite score obtained in this analysis (Z — 
450.05). 
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TABLE 13 
NUMBER or SUBJECTS ÜLASsIFIED AS ORGANIC AND CONTROL By DISCRIMINANT Function III 
Criterion classification 
Classification by DF III Organic Combined control clasited | by 
DF III 
N Percent N Percent 
Organic 109 70 22 10 131 
Combined control 48 30 188 90 236 
Total in class 157 100 210 100 367 
Probability of correct classification when Z = 11.25 for base rates P and Q when pı = .70* and f» = .105. 

Presumed base rates Po Presumed base rates Po 

P= .10 Q = .90 44 P= .60 Q = .40 .91 
P-.20 Q = .80 .64 P=.70 Q = .30 .94 

P = .30 Q= .70 NI P= .80 Q = .20 97 

P = .40 Q = .60 .82 P= .90 Q = .10 .98 

P= .50 Q = .50 .88 


^p, = Valid positive rate, P = base rate of actual organics. 


b pa 

p» — Proportion of nonorganies misclas- 
sified by test (‘false positive" 
rate). 

The results of this analysis (Table 13) 
indicate that under conditions of extreme 
base-rate asymmetry in which the incidence 
of brain damage is low (P = .10, Q = .90), 
the diagnostician would be correct only 44 
times in 100 when he predicted a brain lesion 
on the basis of a positive test sign with this 
instrument. This finding limits the useful- 
ness of the BRT, with respect to this de- 
cision, for settings in which the incidence of 
brain injury is extremely low (e.g., mental 
hygiene clinics). In all other populations, 
however, the efficiency of this test would be 
high. In fact, its greatest usefulness would 
be in an inpatient medical setting (Po 
.91) in which the incidence of brain disease 
is higher (P = .60). The probabilities of 
correct classification for various theoretical 
base-rate populations are plotted in Figure 
3 and show how the likelihood of correct 
decisions increases markedly as the inci- 
dence of brain disease increases in the popu- 
lation (P > Q). 

The second analysis was concerned with 
the percentage of correct classifications for 
positive and nonpositive BRT scores on 
projected samples (V = 1000) under vari- 
ous base-rate populations. This analysis 
provides a different framework for evaluat- 
ing the potential usefulness of the BRT. 


False positive rate, Q = base rate of actual nonorganics. 


Table 14 presents the probable classifica- 
tion outcomes with this instrument for 1000 
consecutive samples in a setting in which 
the psychologist typically functions (P = 
.20, Q = .80). By merely using the base 
rates in this setting, the diagnostician would 
be correct 80% of the time. His strategy 
would involve the nondifferential prediction 
of “nonorganic” in each case without being 
burdened with the administration, scoring, 
and interpretation of tests. The diagnos- 


5 


PROBABILITY OF CORRECT CLASSIFICATION 
o 


4 6 8 10 


BASE RATES ORGANICS 


2 


Fre. 3. Probability of correct classification 
when Z = 11.25 (DF III) for combinations of base 
rates P and Q when p, = .70 and p: = .10. 
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TABLE 14 


PROBABLE CLASSIFICATION OUTCOME WITH DISCRIMINANT Function III ror 1000 ProsecTep 
Cases WHEN P = .20 AND Q = .80 


Criterion classification 


Total N 
Classification by function Organic Combined control classified by 
function 
N Percent N Percent 
Organic 140 70 80 10 220 
Combined control 60 30 720 90 780 
Total in class 200 100 800 100 1000 


Note.—Valid positives = .70, false positives = .10. P = base rate of actual organics. 


tician, however, who employed the BRT 
(DF II) systematically in this setting, 
would find his time worth the effort. When 
predicting “nonorganic” he would be correct 
92 % of the time (720/780). This represents 
2 1276 inerease in accuracy for the same 
type of decision (valid negative) when em- 
ploying the BRT. Furthermore, the base 
rates alone would not detect a single brain 
lesion in this setting, whereas the test would 


early detection of brain disease is considered 
essential to the medical treatment of a pa- 
tient, then false negatives would represent 
more serious errors than false positives. In 
this case, the use of base rates would fail to 
detect a single brain lesion despite fair 
overall predictive validity (Hy = 65%). On 
the other hand, the BRT would demonstrate 
better classification whether the psycholo- 
gist predicted “organic” or “nonorganic,” 


TABLE 15 


PROBABLE CLASSIFICATION OUTCOME WITH DISCRIMINANT Function III ror 1000 PROJECTED 
Cases wHEN P = .35 AND Q = .65 


Criterión classification 


Total N 
Classification by function iss Cepit control pei 4 
N Percent N Percent 
Organic 245 70 65 10 310 
Combined control 105 30 585 90 690 
Total in class 350 100 650 100 1000 


Note.—Valid positives = .70, false positives = .10. P = base rate of actual organics. 


result in the correct classification of 140 
brain-injured Ss (p; = .64). 

Table 15 presents the probable classifica- 
tion outcomes (N = 1000) with this instru- 
ment in a typical medical inpatient setting 
similar to the present standardization popu- 
lation (P = .35, Q = .65). By using the base 
rates in this setting, the diagnostician would 
be correct 65% of the time by merely pre- 
dicting "nonorganie" in each case. The 
BRT, however, would result in a 20 % higher 
rate of classification for the same decision 
(585/690 = 85%). Furthermore, the test 
would correctly classify 80% of the brain 
disorders (245/310). Application of the base 
rates in this setting would not result in the 
identification of a single organic case! If the 


and would furthermore allow him to make 
some contribution in the setting in which 
he functioned. 


Cost Efficiency and Test Prediction 


Assuming that differential error risks are 
involved between false negative and false 
positive decisions in organic brain assess- 
ment, then it would seem that the efficiency 
of any test should be evaluated further in 
terms of the relative costs of these two types 
of decision errors, and not merely against 
the prevailing base rates. The base rates 
assume implicitly that both types of decision 
errors are equally bad. In the preceding dis- 
cussion, for example, it was felt that the 
failure to detect brain damage (a false nega- 
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tive error) was seemingly a more serious 
risk than to misclassify a nonorganie person 
(a false positive error), in that the failure 
to detect brain disorder could have serious 
implications with respect to the life of a 
human being, not to mention the additional 
medical expenses and repeated hospitaliza- 
tions necessary if the disease process should 
remain undiagnosed. Furthermore, in view 
of the fact that the decision to employ the 
base rates would most likely result in the 
strategy to predict in favor of the higher Q 
values (ie., when P < Q), then this com- 
parison with the base rates might well lead 
to an incorrect decision regarding the ac- 
ceptance or rejection of a given test. The 
reason is that the use of the base rates in 
this case would result in an absolute reduc- 
tion of false positive errors at the expense 
of the more serious false negative errors. 

Rimm (1963) has shown that some at- 
tempts to consider the relative error costs in 
dollars in test prediction can often reverse 
the decision regarding the rejection of a given 
test, despite unfavorable comparison with 
the base rates. The method he proposed in- 
cluded a utilization of the base rates (P and 
Q), the discriminatory efficiency of the test 
(p, and pz), and a relatively simple formula 
involving these differential error costs." 
This formula was defined as follows: 


Cost efficiency = Pun iS tee Me (3) 


in which R represents the ratio of the cost 
of a false positive error to the cost of a false 
negative error. 

In order to derive a numerical translation 
of these relative error costs, it was assumed 
that a false negative error would result in 
the long run in twice the investment of 
money and professional man hours as com- 
pared with a false positive error. Therefore 
the dollar cost ratio of a false positive to a 
false negative error was set at 1:2; that is, 
R was equal to 14 or .5. 

To reexamine the BRT (DF III), given a 
valid positive rate of pı = -70 and a false 


“The reader is referred to this interesting 
article for a more detailed explanation of the 
problem and the mathematical steps involved 
(Rimm, 1963). 


positive rate of p; = .10, for the more typ- 
ical clinical setting in which the incidence of 
brain disorder is P = .20, the cost of effici- 
ency of this predictor function would be: 


. .50(.80).10 _ 


A .20 


50 


This value means that for every dollar 
that would have been spent paying for the 
base rate errors resulting from the prediction 
of “nonorganic” in every case, .50 dollar 
would have been saved as a result of employ- 
ing the BRT (DF III). Or, in other words, 
for every dollar spent as a result of errors 
obtained by using the base rates, 1 — .50 = 
50 dollar would have been spent if the 
BRT had been used instead. Equally strik- 
ing are the results for populations in which 
the incidence of brain disease was extremely 
low (P = .10). These findings are reported 
in Figure 4. The cost efficiency of the BRT, 
when P = .10, was .25, which indicates that 
even under extreme base rate asymmetry it 
would have been more efficient to employ 
the BRT. Figure 4 also shows that the 
superiority of the BRT increases as the in- 
cidence of brain disease increases in the 
population (P > Q). 


Composition and Classification of the Criterion 
Groups 


Table 16 presents the major subgroups 
within the organie and nonorganie criterion 
groups and the classification frequencies of 
each subgroup based on the discriminant 
function composite scores (DF III). The 
cutting line derived from analysis of DF III 
is presented to show the differential classifi- 
cation rates between subgroups. The non- 
organic group was largely composed of 
psychiatric (VN = 120) and medical dis- 
orders (N = 51) and a smaller group of 
normal Ss (N = 39). Roughly 35% of the 
psychiatrie subgroup was composed of 
schizophrenics. The highest rate of correct 
classification was obtained with the normal 
(97 %) and psychiatric groups (90 %). It was 
the medical subgroup that showed the great- 
est amount of misclassification (18 70). Head- 
aches, nausea, vomiting, and dizziness were 
the most characteristic presenting symptoms 
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TABLE 16 
PREDICTIVE CLASSIFICATIONS FOR SUBGROUPS WITHIN THE ORGANIC AND NONORGANIC 
CRITERION Groups By Use or Discrrminant Function III 


Interval composite 


Organic (N = 157) 


Nonorganic (N = 210) 


(sed SX Neop Vasc Traum ASCVD Drug Other Norm Med Psych 
28.15-29.84 2 
26.46-28.14 
24.77-26.45 T 1 
23.08-24.76 1 2 2 
21.39-23.07 1 3 3 1 3 1 
19.70-21.38 1 1 4 3 7 T 
18.01-19.69 2 2 2 1 1 1 
16.32-18.00 3 1 6 1 H 1 2 
14.63-16.31 H 2 4 3 4 1 1 3 
12.94-14.62 6 1 4 4 4 2 2 4 3 
11.25-12.93 3 5 3 3 3 3 
9.56-11.24 4 4 3 3 2 3 5 13 
7.87-9.55 à 1 2 3 3 % 11 25 
6.18-7.86 6 2 1 2 1 13 15 36 
4.49-6.17. 4 1 1 1 ih 15 10 31 
2.80-4.48 1 1 1 3 
Total 32 19 30 21 37 10 8 39 51 120 
Hits 16 11 25 16 30 6 5 38 2 108 
Misses 16 8 5 5 7 4 3 1 9 12 


Note.—SX = seizures, Neop = neoplasms, Vasc = vascular, Traum = traumatic, ASCVD = arterio- 
sclerotic-vascular, Norm = normals, Med = medical, Psych = psychiatric. 


in this group, although neurological exam- 
ination failed to document the presence of 
brain disease in these patients. 

The organic group was composed largely 
of vascular (N = 30), neoplastic (N = 19), 


10 


PROBABLE COST EFFICIENCY 
r 


ANo. Tiare Lud To 
2 4 6 8 10 


BASE RATES ORGANICS 
Fia. 4. Cost efficiency probabilities of DF III 
for combinations of base rates P and Q when 
Pı = .70 and p: = .10. 


traumatic (N = 21), arteriosclerotie (N = 
37), and convulsive disorders (VN = 32). 
Inspection of this table shows that the high- 
est rate of correct classifieation occurred 
with the vascular (83 95) and arteriosclerotic 
disorders (81%). On the other hand, the 
largest percentage of misclassification (false 
negatives) occurred with the convulsive dis- 
orders (50%). In order to determine more 
clearly the effects of disease classification on 
test performance, an analysis of variance 
was then computed on the discriminant 
composite scores for each subgroup (Hy- 
pothesis Eq. 3). 


Analysis of Classification of Brain Disease 


The mean discriminant composite scores 
for the different lesion groups are presented 
in Table 17. Although each of the composite 
means was above the discriminant cutting 
line (Z = 11.25), the mean of the convulsive 
group fell only slightly above this value 
(Z = 11.58). An analysis of variance (Table 
17) on the composite means revealed an 
overall separation between groups (F = 
2.51, p < .02). Additional tests between 
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TABLE 17 
Discriminant Function (III) Composite MEANS BY CLASSIFICATION OF BRAIN DISORDER 
Type sx Neop Vase Traum ASCVD Drug Other 
Composite mean 11.58 14.74 16.09 14.07 15.97 15.28 13.76 
N 32 19 30 21 37 10 8 
Analysis of variance of composite predictor scores 
Source ss df MS F ? 
Between 442.89 6 73.82 2.51 <.02 
Within 4416.21 150 29.44 
Total 4859.10 156 


individual group means, however, showed 
that these differences were largely due to 
the effects of the convulsive group. These 
findings lend only partial support for the 
hypothesis that classification of lesion is a 
meaningful independent variable within the 
concept of “brain damage.” 


Analysis of Generalized Effects 


This analysis was essentially an extension 
of the earlier analysis addressed to the 
problem of whether the composite scores 
were independent of area of involvement. 
In this analysis Ss were classified only ac- 
cording to laterality of lesion, with 34 left 
hemisphere and 34 right hemisphere cases. 
The remaining group of indeterminate le- 
sions (N = 89) consisted largely of bilateral 
and diffuse brain-lesion cases. These data 
are presented in Table 18. Disease classifica- 
tion was controlled for between left and 
Tight hemispheric cases due to the effects 


TABLE 18 
Discriminant Function (III) COMPOSITE 
MEANS BY AREA OF BRAIN 


INVOLVEMENT 
Area of damage N Composite 
Left hemisphere 34 14.62 
Right hemisphere 34 14.60 
Indeterminate hemisphere 89 14.48 


Analysis of variance of composite predictor scores 


Source ss df MS F »?» 
Between .61 2 .31  .000 ns 
Within 4858.49 154 31.55 

"Total 4859.10 156 


of this variable on test performance. In other 
words, both hemispheric groups had the 
same frequency of vascular, neoplastic, 
traumatic, arteriosclerotic, and convulsive 
disorders. Inspection of Table 18 shows that 
the discriminant composite means were all 
well above the cutting line (Z = 11.25). 
Analysis of variance of the composite means, 
however, showed no overall differentiation 
between area of involvement (F < 1). In 
fact the composite mean for the left hemi- 
sphere group (Z = 14.62) was almost the 
same as for the right hemisphere group (Z = 
14.60). This finding lends additional support 
for Hypothesis 2. 


Discussion 


The present findings lend considerable 
support for the predictive validity of the 
BRT. Although classification shrinkage oc- 
curred after the cross validation of DF I, 
the subsequent cross validation and re- 
standardization analyses revealed a more 
stable level of predictive classification. This 
was probably due to increased representa- 
tion of cases within both criterion groups 
which helped to stabilize the lambdas for 
each of the predictor variables. More im- 
pressive, however, is the fact that the initial 
cross validation (DF I) showed only a 14% 
shrinkage in overall hits despite the fact 
that a new population was sampled, differ- 
ent examiners were used, and further, that 
the original standardization (DF I) was 
based on a relatively small sample of eases 
(N — 122). The initial cross validation also 
revealed that the false positive errors ac- 
counted for most of this shrinkage, varying 
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from 5% on the original standardization ` 
(DF T) to 29 % after cross validation (DF T). 
On the subsequent validations, however, the 
false positive rate never went beyond 17%, 
and finally stabilized around 1046 on the 
final restandardization (DF III). 

The false negative rate, on the other hand, 
was more consistent between each of the 
validation analyses, varying from 19% on 
the initial standardization (DF I) to 30% on 
the final restandardization (DF III). These 
errors, however, accounted for the majority 
of misclassifications throughout the study. 

. Further analysis of the false negative mis- 
classifications, on the entire brain-injured 
standardization group (DF III), demon- 
strated clearly that the classification of brain 
disorder was a significant independent 
variable within the neurological group. It is 
evident from Tables 16 and 17 that con- 
vulsive disorders accounted for the majority 
of false negative errors. In fact, a separate 
analysis of hits within the seizure group, 
according to age, revealed that 85% of the 
epileptics below age 35 were missed by the 
BRT. It was only in the higher age ranges 
that the BRT correctly classified the ma- 
jority of seizure disorders. These latter cases 
were largely selected from the initial Vet- 
erans Administration standardization popu- 
lation (DF I). This finding suggests that 
age, that, is, length of seizures or chronicity, 
might lead to more generalized brain dys- 
function in man. With respect to the high 
incidence of false negatives within the con- 
vulsive group it should also be pointed out 
that this. disorder seldom results in any 
demonstrable structural change in the brain. 
The neurological studies on these patients 
in the present study (e.g., arteriography, 
pneumography, brain scans, etc.) were 
uniformly negative. Their only positive 
neurological findings were EEG abnormali- 
ties and a history of fits. On the basis of 
these findings, one could argue against the 
inclusion of convulsive disorders in the 
neurological group, particularly at ages 
under 35. This procedure would, in turn, 
increase the percentage of valid positives, 
and restrict the purpose of the test to or- 
ganic disorders involving structural changes 
in the brain. This is essentially the procedure 
that neurology and neurosurgery follow in 


evaluating convulsive disorders, that is, to 
determine whether the seizures are second- 
ary to neoplastic, vascular, or traumatic 
disease. 

Similar problems were encountered in the 
analysis of false negative misclassifications 
within the neoplastic group. Although the 
majority of neoplastic lesions were correctly 
classified, five of the eight misclassifications 
in this group occurred with lesions not 
involving the cerebral cortex (i.e., pituitary 
and cerebellar tumors), and one error was 
associated with a small meningioma. In 
other words, only two neoplastic lesions, 
involving the cerebral cortex, were mis- 
classified with the BRT (DF III). Future 
research might likewise dictate that this 
instrument be restricted to lesions involving 
the cerebral cortical structures. Similar 
diagnostic procedures are followed in medi- 
cine. The majority of laboratory tests in 
neurology, for example, are designed for 
specific types and/or areas of brain pa- 
thology (Merritt, 1959). Furthermore, it is 
with cortical lesions that many of these 
laboratory tests have shown an increase in 
false negative classification (Brosin, 1959). 

The highest percentage of valid positives 
occurred within the vascular and arterio- 
sclerotic-degenerative (ASCVD) brain-lesion 
groups (81 %). This was probably due to the 
infrequency of focal disease in these patients. 
The majority of Ss within the vascular group 
had widespread involvement associated with 
one hemisphere whereas many of the Ss 
within the ASCVD group had diffuse cere- 
bral involvement. 

The analysis on types of brain disorder 
provided further support for the importance 
of isolating other variables within the con- 
cept of “brain disorder." Acute brain lesions 
showed greater impairment on their com- 
posite Z scores (DF I) than did the rela- 
tively static brain-lesion group. Although 
the difference between the acute and chroni- 
static groups was not significant, the trend 
was in the direction predicted. The inclusion 
of prefrontal cases within the chronic-static 
group in the present study might have 
contributed to the suppression of differ- 
ences between these two groups. In the 
Fitzhugh et al. study (1961), chronic epi- 
leptics comprised the majority of patients 
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within the chronic-static group. Apparently 
the effects of prefrontal damage were greater 
than the effects of longstanding epilepsy. 
The fact that the majority of classification 
errors occurred within the relatively static 
group (DF I) is important with respect to 
the selection of neurological patients in the 
standardization of a predictive test of brain 
dysfunction. The present findings showed 
that test performance varied as a function 
of both type and classification of brain dis- 
order. Failure to control for the effects of 
these variables could therefore bias the 
outcome of any predictive validation study. 

Area of brain involvement, on the other 
hand, was not shown to be related to test 
performance. The mean predictor composites 
(Tables 6 and 18) were well above the cutting 
line in both analyses, and they failed to 
reveal any overall difference whether damage 
occurred in the frontal regions, left temporo- 
parietal regions or right temporoparietal 
regions. These findings were consistent with 
previous results which have shown non- 
specific effects on complex visuo-perceptual 
tests (Teuber, 1959; Teuber & Weinstein, 
1956). Teuber (1959) hypothesized that 
performance on these complex perceptual 
measures depends on a number of different 
psychologie functions, each of which is 
crucial. Hence, any lesion sufficient to affect 
one of these functions may lead to a signifi- 
cant general impairment. Three postulated 
levels of performance on the BRT were 
separately analyzed in an earlier study 
(Satz, 1906). Level I was defined as the 
perception of the stimulus design as pre- 
sented by E. Level II was defined as per- 
ception of the rotated stimulus image; and 
Level III was defined as motoric translation 
of the rotated perceptual image. These three 
levels apparently interact in that an im- 
pairment at Level I, that is, an inability to 
correctly perceive the stimulus design, leads 
to a breakdown in performance at the two 
higher levels. In like manner, an error at 
Levels II or III does not, however, necessi- 
tate an error at Level I; for the S may be 
able to correctly perceive the initial stimu- 
lus design, but may not be able to perform 
correctly at the two more difficult levels 
(Levels II and III). Evidence for this test 
behavior was found in the present study in 


which several of the organic Ss failed by 
merely reproducing E's stimulus designs 
(Duplication error), A reproduction of the 
stimulus design, however, indicates, by 
definition, correct perception at Level I. 
This type of error performance parallels 
rather closely the concept of “stimulus 
boundedness” cited by other investigators 
(Goldstein, 1959; Hemmendinger, 1953). It 
also suggests that different levels of per- 
ception might be involved in performance in 
a task such as the BRT; first, a lower level 
recognition-discrimination system; and sec- 
ond, a higher level system involving more 
complex integrative perceptual processes. 
Bortner and Birch (1960, 1962) have re- 
cently advanced support for a similar 
hypothesis of levels in perception. The pre- 
ceding analysis of the BRT suggests that 
an error at any of the three levels will lead 
to general impairment on any test item. If 
it could be demonstrated that different 
brain lesions have different effects on each of 
these levels, then one could account for the 
nonspecific effects obtained with this test. 
A lesion sufficient to upset one of the postu- 
lated levels would lead to significant general 
impairment. 

An additional part, of this study was ad- 
dressed to the utility or efficiency of the 
BRT within various theoretical base rate 
populations. The results showed rather 
clearly the effects of extreme base rate 
asymmetry (P « Q) on this instrument. In 
settings in which the incidence of brain dis- 
ease was low (P < .20), the decision to 
employ the BRT would depend on a number 
of factors in addition to overall hits, per- 
centage of valid positives and percentage of 
false positives. One of the factors involved 
the varying incidence of brain disease in 
different clinical settings (i.e., base rates). 
If, for example, the incidence of brain disease 
was low in a particular setting (P = .20, 
Q. = 80), the diagnostician could be assured: 
of an overall hit rate of 80 % by merely using 
the higher base rate value (i.e., Q) and pre- 
dieting “nonorganic” in each case. This 
procedure would require no test adminis- 
tration, scoring, or interpretive time. Fur- 
thermore, the percentage of overall hits 
would appear to be approximately the same 
whether the test (DF III) or base rates were 
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employed (81% versus 80%). If the inci- 
dence of brain disease was even lower (ie, 
P = .10), the advantages of employing the 
base rates would appear even more striking 
(81 % versus 90 %). However, the percentage 
of overall hits (H4) is a value that is often 
misleading in test prediction and which can 
be spuriously inflated by a much larger N 
in the criterion group with the higher valid 
positive or valid negative rate. That is why 
it is essential to have information on the 
percentage of valid positives and percent- 
age of false positives for any predictive test. 
Furthermore, in comparing the relative 
efficiency of a test against the prevailing 
base rates, it is additionally important to 
determine the predictive outcomes on 
projected samples. This approach was fol- 
lowed in the present study (Tables 14 and 
15). For the setting in which the incidence 
of brain disease was low (P = .20, Q = .80), 
it was shown that it would be risky to con- 
clude that the base rates were superior to 
the BRT (DF III) on the basis of overall 
hits (Table 14). When the test was examined 
on 1,000 projected cases, given P — .20, it 
was discovered that the diagnostician who 
employed the test systematically in this 
setting would find his time well spent. When 
predicting *nonorganie" he would be correct 
92% of the time. This outcome represented 
a 12% increase in accuracy over the base 
rates for the same decision statement. 
Further, the base rates would not detect a 
single brain lesion in this setting, whereas 
the test would correctly identify 140 brain- 
injured Ss. Similar, although less striking, 
advantages of the test were found in the 
more extreme base rate populations (P = 
-10, Q = .90). In other words, the original 
comparison between the BRT and base 
rates, on overall hits (81 % versus 80 %), was 
shown to be misleading without additional 
analysis on projected samples involving 
both percentage of valid Positives and per- 
centage of false positives. 

A second approach, for determining the 
relative efficiency of the test, was addressed 
to the more practical question: How sure 
could you be on the basis of a positive test 
sign with this instrument under varying 
base rate populations? This problem has 
been discussed by Meehl and Rosen (1955) 


and involved the application of inverse 
probability. The results of this analysis 
(Figure 3) showed that only in settings in 
which the incidence of brain disease was 
extremely low (P = .10, Q = .90) would the 
diagnostician be wrong more often than 
right in predicting the presence of brain dis- 
order, given a positive score on the BRT 
(Z = 11.25). In all other clinical settings 
the diagnostician could be quite confident 
of his prediction of brain dysfunction on the 
basis of a positive composite score with this 
instrument. In other words, the risk of a 
false positive error was shown to be more 
likely in settings similar to mental health 
clinics in which the incidence of brain dis- 
ease is low. In other settings, particularly 
hospitals and inpatient medical services, 
the probability of being correct on the basis 
of a positive BRT score would be high. 

A third approach, for determining the 
utility or efficiency of the BRT, was ad- 
dressed to the differential risks between 
false positive and false negative decision 
errors. For example, in settings in which the 
incidence of brain disease was low (P < Q), 
the strategy to employ the higher base rate 
value (ie. Q) would have resulted in an 
absolute reduction of false positive errors 
at the expense of the false negative errors. 
In other words, without the test, not a single 
brain-lesion case would have been detected. 
This decision error (false negative), however, 
was felt to represent a more serious risk 
than the misclassification of a normal person 
(false positive) because the failure to detect 
the presence of brain disease could lead to 
grave consequences and additional cost for 
an individual should the disease process 
remain undiagnosed. Implicit to the base 
rates is the assumption that both decision 
errors are equally risky. Rimm (1963) has 
shown that attempts to consider the relative 
error costs in dollars for predictive tests 
can often reverse the decision regarding the 
rejection of a given test, despite unfavorable 
comparison with the base rates. In the pres- 
ent study the attempt to translate the ratio 
of these differential error risks into numeri- 
cal valuation provided the third framework 
for evaluating the efficiency of the BRT. 
These cost efficiency analyses provided addi- 
tional support for the superiority of the 
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BRT, even for settings in which the inci- 
dence of brain disease was extremely low 
(P = .10, Q = .90). 

The final interpretations with respect to 
the usefulness of this predictive instrument 
must perforce be left to the decision of the 
particular diagnostician. The preceding 
analyses on predictive efficiency merely 
represented different frameworks for asking 
a number of different questions about the 
possible usefulness of this test. Although the 
predictive validity of the final restandardiza- 
tion was high (DF IIL) it was shown that 
the efficiency of the BRT was somewhat 
limited by the effects of extreme base rate 
asymmetry (P « Q), but only when the 
psychologist was concerned about the prob- 
ability of being correct on the basis of à 
positive test sign. This finding should be of 
some concern to psychologists working in 
community mental health centers. On the 
other hand, if it was decided that in this 
setting the risk of false negative error was 
more costly than a false positive error, then 
the strategy to employ this test could be 
justified. The diagnostician could at least 
be confident that his false positive rate would 
be small (ie., 10%) and, should the more 
serious problem of brain pathology be 
present, he would, in the majority of cases, 
detect it. But again, these decisions would 
have to be weighed against the additional 
time and expense involved in test adminis- 
tration and scoring. 

Before concluding it should be emphasized 
that the predictive validity of this multi- 
variate instrument is by no means fully 


demonstrated. Additional cross-validational 
studies are still needed, preferably in other 
settings. More serious is the problem of 
defining clearly the predictive purposes of 
the test, particularly with respect to classifi- 
cation and area of brain involvement. Should 
the criterion definitions pertain only to those 
organic brain lesions which cause structural 
damage restricted to the cerebral cortex? 
This criterion specification would obviously 
exclude the majority of convulsive disorders 
and neoplasms situated in lower centers of 
the brain. More research is needed before 
these problems can be resolved. 

A final word should be given to objections 
which might arise on the utility of an in- 
strument that purports to measure only the 
presence or absence of brain disorder. The 
point is that before the psychologist can 
begin to make second-level inferences con- 
cerning the laterality or localization of brain 
lesions, he must first demonstrate a strong 
likelihood for the presence of brain dys- 
function. Only then can he proceed, using 
additional tests, to other levels of decision 
inference. The decision process in neurology, 
on the other hand, is not always restricted to 
a step-wise level of inferences. There are test 
procedures (e.g., angiography) in which 
visualization of the lesion leads, by defini- 
tion, to detection and often provides addi- 
tional information concerning the type of 
lesion and whether or not it is localized. The 
present investigation was carried out to 
examine the complexities of the initial stages 
of this decision process for the psychologist. 
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SUBSTANTIVE DIMENSIONS OF SELF-REPORT IN THE 
MMPI ITEM POOL 


JERRY $. WIGGINS 


University of Illinois 


Starting with the original item-content classifications of Hathaway and 
McKinley, both psychometric and intuitive procedures were employed in 
the development of a set of scales designed to be internally consistent, 
moderately independent, and representative of the major substantive 
clusters that appeared to exist in the total MMPI item pool. Although not 
constructed by the strategy of contrasted groups, the 13 MMPI content 
scales nevertheless exhibited signifiant variability of mean scale scores 
when diverse normal, college, and psychiatrie populations were compared. 
Moreover, when multivariate analytic procedures were employed in the 
diagnostic classification of psychiatric inpatients, the content scales were 
found to be as promising as the conventional clinical scales which were 
derived for this specific purpose. The factorial structure of the content 
scales was related to that of the clinical scales and found to lend support 
to the interpretation of the first 2 factors of the MMPI as “ego re- 
siliency” and “control.” An example was provided of the manner in which 
content scales may be employed as a supplement to interpretation of the 


MMPI clinical scales. 


Ts concept of item content has en- 
joyed neither precise specification nor 
active empirieal exploration in the recent 
history of objective personality assessment. 
This situation may, in part, be attributed 
to the general disrepute into which “face 
validity” has fallen as a validity criterion 
(APA, 1954; Stagner, 1958) and to the 
tendency to regard responses to ambiguous 
items as reflecting “dynamic” aspects of 
personality. Meehl’s (1945) now classic 
empirical manifesto raised the hope that 
dynamic aspects of personality might be 
assessed by means of true-false item pools 
superficially bearing little resemblance to 
the criterion at hand. A not infrequently 
encountered corollary of this belief is the 
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The writer is indebted to Lewis R. Goldberg 
and Nancy Wiggins for their helpful criticisms of 
an earlier version of this monograph. 


superstition that knowledge of the content 
of an empirical scale may somehow vitiate 
the mysterious mediating process that links 
scale scores to empirical criteria. 

Jackson and Messiek's (1958) influen- 
tial distinction between content and style 
in personality assessment was partly moti- 
vated by their desire to measure the former 
class of variables with more precision. 
Their methodological innovations enabled 
them to separate, within limits, components 
of content and style in the MMPI (Jack- 
son & Messick, 1961). Several studies la- 
ter, their view of content has a wistful 
tone: “Actually, we are very much con- 
cerned with measuring content, but con- 
tent—like a tarpon being hunted by a 
spear fisherman at ten fathoms—usually 
appears somewhat closer, larger, and more 
easily captured than is actually the case 
[Jackson & Messick, 1962]" Without 
denying the rather poor showing that con- 
tent made in their studies, it should be 
noted that their criterion for content de- 
manded interitem consistencies within the 
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MMPI clinical scales that survived the 
partialing out of two potent sources of con- 
tent-confounded stylistic variance—‘ac- 
quiescence” and “social desirability.” As 
will be indicated later, the assessment 
strategy of contrasted criterion groups 
(Wiggins, 1962) which was employed in 
the development of the MMPI clinical 
scales cannot be expected to insure con- 
tent homogeneity within empirical scales. 

The most nihilistic position with respect 
to personality test item content has been 
taken by Irwin Berg, the originator of the 
Deviation Hypothesis (Berg, 1955, 1959, 
1961). The assessment strategy of statisti- 
cal differentiation between deviant and 
normative groups has impressed Berg as 
being so fundamental to personality meas- 
urement as to render unimportant the item 
content whereby this differentiation is 
achieved (Berg, 1959). In many ways this 
position is a restatement of the pragmati- 
cism of the empirical movement (Meehl, 
1945) with a non sequitur corollary which 
makes the blindness of blind empiricism a 
virtue. Berg’s main point with respect to 
content seems to be that any given content 
may be considered in principle to be as 
effective for a predictive task as any other 
content and that recourse should be made 
to empirical evidence as the final arbiter, 
He is careful to note that this does not 
mean that “...any item is just as good as 
every other for discriminative purposes 
[Berg, 1959, p. 89].” However, he tends to 
overstate his case: 


-..One should be able to construct the MMPI 
scales from the Strong Interest Blank and the 
Strong occupational scales from the MMPI items 
by using the same technique. Or, for that matter, 
one should be able to develop the scales of both 
tests from almost any hodge-podge of a similar 
number of items.... Given enough deviant re- 
sponses and clean criterion groups, one should be 
able to duplicate any existing personality, interest, 
occupational and similar scales without regard to 
particular item content [Berg, 1955, p. 70]. 


The basis of the above inference is not 
clear since Berg is unable to provide even 
a rudimentary rationale whereby one might 
be able to predict the suitability of a given 
content for a given assessment. Similarly, 
when he states: “...the carefully de- 
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scribed 26 categories of test item content 
employed by the MMPI are probably ir- 
relevant for clinical measurement purposes 
[Berg, 1961, p. 361],” it is not clear how one 
might know this in advance of empirical 
test. One may argue that, in principle, al- 
ternative and equally effective item pools 
might be discovered for any given pre- 
diction situation and such a principle can- 
not be disproved. To prejudge a given item 
pool requires a theory of content, however, 
and Berg’s contribution to this enterprise 
has been mainly a negative one (Norman, 
1963). 

In light of the foregoing discussion it is 
not surprising that the 26 content cate- 
gories involved in the original classifica- 
tion of the MMPI item pool have received 
little attention in the literature (Wiggins 
& Vollmar, 1959). The test authors them- 
selves (Hathaway & McKinley, 1940) 
were reluctant to attribute much signifi- 
cance to either selection or classification of 
item content, although their aim “... that 
more varied subject matter be included to 
obtain a wider sampling of behavior of 
significance to the psychiatrist [p. 249]" 
seems clearly to have been met. The con- 
tent categories, themselves, have not ex- 
cited the curiosity of many of the authors 
who have contributed to the nearly 1,000 
articles (Dahlstrom & Welsh, 1960) on 
the MMPI that have since appeared. 

While the academic and professional 
community have seen fit to ignore or 
denigrate the content of the MMPI, other 
segments of our society (“subjects”) have 
been less quiescent (see American Psychol- 
ogist, APA, 1965). Viewed from the other 
side of the desk, the 566 items of the 
MMPI appear to represent a massive in- 
vasion of privacy. Appeals to the princi- 
ples of empiricism (Gordon, 1965) serve 
only to emphasize the insensitivity of the 
professionals involved, and such appeals 
hardly justify the use of any particular set 
of items for a given selection purpose. At- 
tempts to placate the public by removing 
the more “offensive” items from the MMPI 
pool (Braaten, 1965) cannot be justified on 
a scientific basis. In short, a legitimate is- 
sue has been raised concerning the content 
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of personality inventory items, and the 
scientific and professional community has 
been stirred from an undeserved com- 
placency. The viewpoint that a personality 
test protocol represents a communication 
between the subject and the tester (or the 
institution he represents) has much to 
commend it; not the least of which is the 
likelihood that this is the frame of refer- 
ence adopted by the subject, himself. 

The present study represents a first step 
in the direction of clarifying the content 
of the MMPI item pool. Starting with the 
original content classifieations of Hatha- 
way and McKinley, both psychometric 
and intuitive procedures were employed in 
the development of a set of scales de- 
signed to be internally consistent, moder- 
ately independent, and representative of 
the major substantive clusters that ap- 
peared to exist in the total MMPI item 
pool. 

There have been a number of previous 
attempts to provide bases for regrouping 
MMPI items in ways other than that pro- 
vided by the standard empirical scales. 
For the most part, these studies have used 
existing empirical scales as the basis for 
further regrouping. Homogeneous sub- 
groupings of items within each of the 
standard empirical scales have been identi- 
fied on a rational basis by Harris and 
Lingoes (1955) and on a factor analytic 
basis by Comrey (1957a, 1957b, 1957c, 
1958a, 1958b, 1958c, 1958d) and Comrey 
and Marggraff (1958). Among other 
things, these studies have indicated that 
the standard MMPI empirical scales are 
far from homogeneous in item content and 
that the dimensionality of the MMPI item 
pool might be greater than that suggested 
by factorial studies of the individual scales 
(Lingoes, 1960). It is important to note, 
however, that the substantive dimensions 
which emerged in these studies are dimen- 
sions which are defined in relation to the 
original empirical scales. These clusterings 
are based on only a portion of the total 
MMPI and represent subclusters of con- 
tent “filtered through” the strategy of con- 
trasted groups employed in the construc- 
tion of the original scales. Such clusterings, 


no doubt, contain meaningful dimensions 
of item content, but, in addition, they 
contain variance peculiar to all dimensions 
along which the originally contrasted nor- 
mal and psychiatric groups differed (Wig- 
gins, 1962). 

Attempts at more efficient measurement 
through factorially derived scales have 
also been conducted within the limited 
context of the original empirical scales. 
Welsh (1956) cluster analyzed nonover- 
lapping clinical scales to obtain markers 
for his item analytic procedures which 
yielded the well known A and R scales. 
Similarly, Eichman (1961, 1962) used the 
results of a factor analysis of clinical 
scales to derive his factor scales. Although 
working on the item level, Mees (1959) 
employed only 119 items selected from 
the standard clinical scales in developing 
his item factor scales. The fact that only 
subsets of items defined by the clinical 
scales were employed in these factorial 
studies probably does not seriously detract 
from their goal of more efficient measure- 
ment. Factor scales for the MMPI can be 
developed from almost any subsample of 
items (Wiggins & Lovell, 1965) and pos- 
sibly from just a few direct statements 
(Peterson, 1965). Unfortunately, the fac- 
torial homogeneity of MMPI items has 
made the test particularly vulnerable 
to interpretations of stylistic (Edwards & 
Diers, 1962; Jackson & Messick, 1961; 
Messick & Jackson, 1961) and method 
(Wiggins & Lovell, 1965) components 
being involved or contaminated with sub- 
stantive components. The extent of this 
contamination cannot be fully assessed un- 
til a serious effort has been made to il- 
luminate the substantive dimensions of 
the total item pool rather than simply that 
portion of it which is most responsive 
to the strategy of contrasted groups. 


ORIGINAL CONTENT CATEGORIES 


In selecting items for possible inclusion 
in the final version of the MMPI, the 
"universe of content" (Loevinger, 1957) 
was deemed to be "behaviors of signifi- 
cance to the psychiatrist" (Hathaway & 
McKinley, 1940). With this in mind, 
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...the items were supplied from several psychi- 
atric examination direction forms, from various 
textbooks of psychiatry, from certain of the direc- 
tions for case taking in medicine and neurology, 
and from the earlier published scales of personal 
and social attitudes [Hathaway & McKinley, 
1940, p. 249]. 


The names of the 26 categories suggested 
by the test authors as descriptive of item 
clusters in the MMPI pool are given in Ta- 
ble 1. To these 26 labels have been added 
phrases that are descriptive of the item 
content within each category. 


Internal Consistency 


As a first step in the investigation of the 
contribution of the original content cate- 
gories to test variance, each category was 
considered to be a “scale” composed of n 
items which could be combined to yield a 
single total score for any individual. As a 
preliminary scoring method, each item 
was keyed in the direction of “deviance” 
as determined by the infrequent item op- 
tion chosen in the Minnesota normal popu- 
lation (Hathaway & McKinley, 1951, 
pp. 26-29).? It should be emphasized that 
this scoring procedure is not entirely con- 
sistent with the empirically determined 
keying direction of the MMPI clinical 
scales, since several of the clinical scales 
contain items which are keyed in the pop- 
ular direction. More important, such a 
scoring procedure in no way insures opti- 
mal scale homogeneity since both ends of 
an attitudinal continuum may be deviant 
with respect to population norms. In the 
case of sexual attitudes, for example, items 
admitting both antisexual attitudes and 
sexual acting out are deviant and hence 
both keyed in the same direction although 
such keying is intuitively inconsistent. 
However, in the absence of detailed infor- 
mation concerning such things as interitem 
correlations, the preliminary scoring method 


*It is important to note that such a keying 
procedure is correlated with “absolute” rather 
than “relative” deviance in Berg’s (1961) usage. 
Although the infrequent item option in the Min- 
nesota normals is often assumed to be an “ab- 
normal” response (i.e., typical of psychiatrie popu- 
lations) such an assumption is more likely wrong 
than right (Ullmann & Wiggins, 1962). 
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was considered the one most compatible 
with the original purpose of the item pool, 

The internal consistency of content 
categories thus formed was assessed from 
the full-scale MMPI protocols of 500 
Stanford University students in introduc- 
tory psychology. Total scores on odd and 
even items within each of the 26 content 
categories were obtained separately for 
250 men and 250 women students. Corre- 
lations between odd and even item totals, 
corrected by the Spearman-Brown formula 
for double test length, are given in Table 
2 


The internal consistencies of the origi- 
nal content categories can be seen to vary 
from near-zero coefficients to coefficients in 
the .80s. Directly comparable internal 
consistency appraisals of the standard 
MMPI clinical scales have not been re- 
ported for college students taking the 
group form (Dahlstrom & Welsh, 1960, p. 
474). However, the most comparable data 
available (Gilliland & Colgin, 1951) sug- 
gest that the majority of content cate- 
gories have internal consistency coefficients 
equal to or greater than those reported for 
the standard MMPI clinical scales. As in- 
dicated in Table 2, the internal consisten- 
cies of many of the content categories are 
hampered by containing small numbers of 
items. Hathaway and McKinley (1940) 
were not explicit about the extent to 
which the number of items in a given cate- 
gory can be taken as representative of the 
relative significance of the category to a 
psychiatrist. Whether arising from im- 
plicit, explicit, or fortuitous circumstances, 
there are definite psychometric restrictions 
on the extent to which 5-item content 
categories may contribute to total test 
varianee as contrasted with 55- or 72-item 
categories. When all content category in- 
ternal consistency coefficients are corrected 
to the common base of the largest cate- 
gory (72 items), there is little to dis- 
courage an investigator from developing 
an expanded pool of items for any of the 
categories simply because some of them 
happen to be underrepresented. in the 
MMPI. The obtained reliabilities are even 
more impressive in light of the previously 
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TABLE 1 
ORIGINAL CONTENT CATEGORIES or THE MMPI 


Affect, depressive (32 items): Sadness, despair, pessimism, futility; loneliness; guilt and expectation 
of punishment; worrying and brooding; sensitivity; anxiety; psychomotor retardation. 

Social Aititudes (72 items): Introverted, seclusive; withdrawn; shy; nonoutgoing; non-fun-loving; 
overly sensitive; irritable; feels misunderstood; lacking in self-confidence; social rigidity; uncommuni- 
cative; lacking in social aggressiveness; critical and resentful of others. 

Morale (33 items) : Lacks self-confidence; low self-esteem; works and lives under tension; difficulties 
in concentrating, planning, making decisions, completing tasks; expects failure and resents success of 
others; suggestible and immature; feels misunderstood and unappreciated; sensitive and pessimistic. 

Political Attitudes—law and order (46 items): Sees world as jungle, identification with criminal code, 
distrust of motives of others, discipline problem in school, delinquent childhood, thrill seeking, resent- 
ment and distrust of authority, competitive and vindictive, independence from norms, lack of concern 
for family members’ misbehaviors, opinionated. 

Obessessive and Compulsive states (15 items): Obsessions, compulsions, rumination, destructive im- 
pulses, covert defiance, overt compliance. 

General Neurologic (19 items): Headaches, nausea, seizures, lability, poor judgment, distractibility, 
poor memory. 

Vasomotor, Trophic, Speech, Secretory (10 items): Hot and cold sensations, sweating, blushing, dry 
mouth, poor reading comprehension. 

Delusions, Hallucinations, Illusions, Ideas of Reference (31 items): Delusions of persecution and 
grandeur, ideas of influence, suspiciousness, hallucinations, bizarre experiences, malevolent, forces in 
environment. 

Phobias (29 items): Admission of general fearfulness and worry; specific irrational fears of animals, 
states of nature, disease, heights, crowds, etc. 

Family and Marital (26 items): Lack of affection for parents; domination by parents; lack of parental 
support; desire to leave home; poverty; strife within family; disapproval, resentment, ambivalence, 
and annoyance at family members; disappointment in love; never been in love. 

Lie Ilems (15 items): Naive and improbable claims to virtue with venial sins such as procrastination, 
vanity, gossip, citizenship, mild anger, bad thoughts, competitiveness, etc. 

Masculinity-Femininity (55 items): Feminine interest pattern in literature, hobbies and childhood 
games; preference for feminine as opposed to masculine vocations; confused sexual identity; admission 
of weakness, fears, worries and distress. j ] i 

General Health (9 items): Poor health, worry about health, high strung, weight fluctuation, easily 
tired. 

Motility and Coordination (6 items): Muscular paralysis, contraction, tremor, weakness, and un- 
coordination. Pies 

Gastrointestinal System (11 items): Excessive and poor appetites, stomach trouble, constipation and 
diarrhea, lump in throat. i z 4 MOM NA 

Affect, manic (24 items): Excitement, euphoria, high energy; restless and impulsive; irritability, 
quick temper, destructive impulses; optimism; flight of ideas; unpredictable; short memory; wide and 


short term interests; unusual hearing. i jut jh MN 
its, distractability, sensitivity to opinions and criticisms of 


Occupational (18 items): Rigid work habi 
others, obstinance, indecisiveness, timidity, lack of self-confidence and concern about work, resent- 


ment of boss. ^ edm 
Cardiorespiratory System (5 items): Chronic cough, asthma or hay fever, chest pains, vomiting or 

coughing blood, pounding heart and shortness of breath. > E 
Habits (19 items): Sleep disturbance, sensitivity to dreams, absence of dreams, excessive drinking 

and use of alcohol, abstinence from alcohol, giving in to bad habits. X x t 
Cranial Nerves (11 items): Disturbances in vision, speech, audition, olfaction and swallowing; facial 


paralysis. amus § ^ 
Sensibility (5 items): Hypersensitivity to pain, touch; numbness; tingling skin sensations, 1 
Educational (12 items): Dislike of reading—both fiction and nonfiction, likes funny papers and articles 


on crime, slow learner and disliked school. i , Mes » ^ 
Religious Attitudes (19 items): Fundamentalist beliefs, rejection of fundamentalist beliefs, unusual 


religious experiences, religiosity, magical beliefs, lack of praying and church attendance. i 
Sadistic, M. asochialie Trends (d items): Enjoys hurting and being hurt by loved ones, cruelty to ani- 

mals, enjoys frightening people, fetishism. $ ‘ : 
Sexual Attitudes (16 items): Anxiety over sex, sexual preoccupation, sexual perversion, suppressive 

attitudes toward sex, permissive attitudes toward sex, disgust and embarrassment about sex. 
Genitourinary System (5 items): Disturbance in urination, skin rash, something wrong with sex or- 


gans. 
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TABLE 2 
ConnzcrED Opp-EvEN RELIABILITY 
COEFFICIENTS FOR 26 ORIGINAL 
CONTENT CATEGORIES OF 


THE MMPI 
Txx Txx 
Category n (N — 250 (N — 250 

men) women) 
Affect, depressive 32 -865 — .851 
Social Attitudes 72  .800  .775 
Morale 33 .802 .738 
Political Attitudes 46 «727 .632 
Obsessive-Compulsive 15  .0608  .601 
General Neurologie 19 644 .693 
Vasomotor 10 .632 .628 
Delusions 31 .624 -470 
Phobias 29. .622  .728 
Family and Marital 26  .581 .471 
Lie Items 15  .550  .652 
Masculinity 55 547 505 
General Health 9  .476  .307 
Motility and Coordination 6 ET .553 
Gastrointestinal 11 .470 — .353 
Affect, manic 24 .454 .510 
Occupational 18 .436 .396 
Cardiorespiratory 5 .422 — .477 
Habits 19 .418 -588 
Cranial Nerves 11 .417 — .538 
Sensibility 5 .338 .397 
Educational 12 .333 .261 
Religious Attitudes 19  .258  .184 
Sadistic-Masochistic 7 .244  .302 
Sexual Attitudes 16 .216 .249 
Genitourinary 5 .169 .176 
"Total 550 


mentioned fact that the scoring proce- 
dures employed did not insure scale ho- 
mogeneity. 

It is of interest to note that the three 
content categories which have the highest 
internal consistencies for both men and 
women are Affect-Depressive, Social Atti- 
tudes, and Morale. As previously reported 
(Wiggins & Vollmar, 1959), these three 
categories account for some 70% of the 
item content of Welsh’s A scale, the em- 
pirically derived marker of the potent first 
factor of the MMPI (Welsh, 1956). A 
decade ago, such an observation would 
have lent encouragement to a substantive 
interpretation of the first factor of the 
MMPI. It is now generally recognized 
that an unfortunate confounding of item 
characteristics and content mitigates 
against any such straightforward inter- 
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pretation (Block, 1965; Dicken, Van Pelt 
& Bock, 1965; Wiggins, 1962; Wiggins & 
Goldberg, 1965). 


Factorial Structure 


Despite the fact that some of the 
original content categories are not reliably 
represented and that the present scoring 
method is less than optimal, it is of con- 
siderable interest to inquire into the num- 
ber and kinds of substantive dimensions 
represented in the total MMPI item pool, 
Such an analysis would be the first to be 
based on a mutually exclusive and exhaus- 
tive classification of MMPI items. 

Accordingly, product-moment intercor- 
relations among the 26 total scale scores 
were computed separately in the college 
samples of 250 men and 250 women. The 
matrices of content category intercorrela- 
tions were factored by the method of prin- 
cipal components (Harman, 1960). Latent 
roots which exceeded unity were retained 
and rotated analytically to a varimax cri- 
terion (Kaiser, 1958). 

The method of analysis employed yielded 
seven factors? for men and six for 
women which accounted respectively for 
60.9% and 55.1% of the total variance. The 
rotated factor matrices for men and women 
are presented in Tables 3 and 4. Factor 
loadings less than .33 have been omitted, 
and the matrices have been arranged in 
such a way as to facilitate comparison. 
Factor interpretation will be further facili- 
tated by consulting the content category 
descriptions provided in Table 1. 

Factor I appears to be the familiar gen- 
eral maladjustment dimension of the 
MMPI clinical scales. The content cate- 
gories which load this factor most heavily 
reflect subjectively experienced distress on 
the part of the respondent. Low self-es- 
teem, depressed mood, and feelings of 
inadequacy are coupled with social uneasi- 
ness and introversion. Anxiety is experi- 
enced directly with its usual physiological 
manifestations. This factor is loaded mod- 


*Sinee communalities were arbitrarily set equal 
to unity, the resultant components are not to b 
considered “factors” in Thurstone’s sense of this 
word, although they will be referred to as such. 
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TABLE 3 
ROTATED Factor MATRIX or 26 ORIGINAL CONTENT CATEGORIES 
(N = 250 men) 

I 0 Tl, pests Iv y VI k 
Morale -74 .81 
Vasomotor 74 .66 
Social Attitudes -70 —.34 .68 
Affect, depressive .61 —.41 —.83 .82 
Occupational .45 —.53 .54 
Phobias AL -39 —.39 59 
Affect, manic .40 —.54 .58 
Obsessive .40 —.52 58 
Masculinity .33 —.85 44  —.54 .62 
Political Attitude —.63 .49 
Sadistic —.61 58 
Lie 54 61 
Delusions —.40 .39 .59 
Sensibility —.40 .58 59 
Cranial NI .61 
Gastrointestinal .55 50 
Genitourinary 52 —.50 56 
General Neurologic 50 .33 63 
Cardiorespiratory 49 —.44 51 
Habits 35 45 48 
Motility -.79 69 
General Health —.08 56 
Sexual Attitudes —.73 64 
Family and Marital — .67 .62 
Religious Attitudes 44 57 
Educational .85 .80 
Variance (percent) 20.3 18.9 16.8 11.8 9.6 8.2 14.4 


erately by items reflecting irrational fears, 
restless irritability, and obsessional think- 
ing. In men, this general maladjustment is 
also reflected in poor work habits and 
feminine interests. In women, maladjust- 
ment includes an unsatisfactory family 
background and a greater emphasis on 
poor physieal health and undesirable hab- 
its. 

Factor II is loaded by an intriguing com- 
bination of contents which have heretofore 
been observed to covary only in highly 
Specialized instruments. The political at- 
titudes category reflects authority conflict 
and authoritarian attitudes toward law 
and order. A sado-masochistie orientation 
is combined with obsessive-compulsive 
symptoms and overt compliance. Naive 
and improbable claims to virtue may fur- 
ther suggest rigidity. Together these cate- 
gories provide an almost classic descrip- 
tion of the authoritarian personality 


syndrome which has been described in a 
variety of other contexts (Adorno et al., 
1950; Loevinger, 1962; Rokeach, 1960; 
Stern, Stein, & Bloom, 1956). Although 
this factor is most clearly defined for 
women, the additional categories which 
load it for men are compatible with the 
areas of maladjustment often associated 
with authoritarianism. In men, the mood 
disturbances, poor work habits, sensitivi- 
ties, delusional thinking, and feminine in- 
terests may reflect a more deep-seated per- 
sonality disturbance than is the case with 
authoritarianism in women. 

Two dimensions of physical symptoms in 
men (Factors IIIa and IIIb) appear to be 
combined in a single dimension of physi- 
eal complaint for women (Factor III). 
Factor IIIa in men is loaded by a variety 
of complaints presumably representative of 
disturbance in cranial nerve, gastrointes- 
tinal, sensibility, genitourinary, neurologic, 


IIIb appears to center around general 
health concern, symptoms of fatigue and 
cardiorespiratory complaints. Both of these 
factors are loaded slightly by categories 
of psychological symptoms as well. In 
women, fatigue and general health con- 
cern are combined with the forementioned 
systemic symptoms into one general factor 
of somatic complaint. With the exception 
of the manic category reflecting fast tempo 
and irritability, the psychological symp- 
tom categories appear quite secondary in 
their contribution to this factor. 

Factor IV is highly and uniquely loaded 
by the category of sexual attitudes. Cate- 
gories associated with deviant sexual at- 
titudes vary remarkably for men and 
women. In men such attitudes are asso- 
ciated with family conflict and specific 
genitourinary complaints. In women, this 
factor is negatively loaded by improbable 
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TABLE 4 
Rotatep Facror MATRIX or 26 ORIGINAL CONTENT CATEGORIES 
(N = 250 women) 

I I m IV v VI ia 
Morale +75 74 
Vasomotor 44 .52 
Social Attitudes .80 .70 
Affect, depressive .78 .94 .81 
Occupational .55 .52 
Phobias Al .52 .56 
Affect, manic 41 —.41 53) 
Obsessive 47 387 — .33 .59 
Maseulinity —.88 42 — 44 57 
Political Attitudes 51 —.83 —.84 .58 
Sadistic .80 .66 
Lie —.85 —.56 .46 
Delusions —.57 — .34 .60 
Sensibility — .65 62 
Cranial — .64 43 
Gastrointestinal —.44 ES 
Genitourinary —.83 44 .98 
General Neurologic 47 — .63 .63 
Cardiorespiratory —.56 43, 
Habits 387 — .38 .97 
Motility —.76 E 
General Health .51 —.85 .50 
Sexual Attitudes .69 .50 
Family and Marital 54 45 
Religious Attitudes — .84 73 
Educational .63 42 
Variance (percent) 26.9 11.4 26.6 16.6 9.5 9.0 
and cardiorespiratory systems. Factor claims to virtue and positively loaded by 


feminine interests. Psychological symp- 
toms in women take the form of irrational 
fears, poor work habits, and feelings of de- 
pression and guilt. 

Factor V is strongly and uniquely loaded 
by deviant religious attitudes. In men, 
Sleep disturbance, drinking habits, and 
feminine interests tend to have loadings on 
this factor, and to a lesser extent, some 
somatic complaints. In women, authoritar- 
ian attitudes and delusional thinking have 
very slight loadings on the deviant re- 
ligious attitudes factor. 

Factor VI is defined by deviant educa- 
tional attitudes. In both men and women, 
antieducational attitudes are associated 
with maseuline interests. In women, this 
faetor is also loaded by the category of 
genitourinary complaints: 

In summary, a principal-component anal- 
ysis of the 26 mutually exclusive and 


SUBSTANTIVE DIMENSIONS or SELrF-REPORT 9 


exhaustive content categories of the MMPI 
yielded six interpretable factors in both 
men and women. The first three of these 
factors appear to represent general syn- 
dromes of complaint while the last three 
appear to center around more specific 
substantive categories. The factors of gen- 
eral maladjustment and somatic com- 
plaint are familiar ones which might be 
anticipated both on the basis of clinical 
scale development and of the overrepresen- 
tation of such categories in the MMPI item 
pool. The factor of authoritarianism ap- 
pears to represent a theoretically mean- 
ingful combination of substantive categor- 
ies which has, until now, been obscured by 
the strategy employed in the development 
of the clinical scales. Deviant attitudes 
towards sex, religion, and education have 
likewise not been previously stressed as 
important substantive components of the 
MMPI item pool. 


REVISION OF ORIGINAL. CONTENT 
CATEGORIES 


Given the encouraging internal con- 
sistencies and factorial structure of the 
original content categories, it seemed 
fruitful to attempt a more substantively 
consistent grouping of items within cate- 
gories as a basis for subsequent develop- 
ment of actual content scales. Although 
many strategies of scale construction 
were possible at this point, the one chosen 
placed primary emphasis on the “ra- 
tional” or substantive considerations in- 
volved in the classification of item content. 
Since this strategy is so antithetical to the 
traditional approach to MMPI scale con- 
struction, a brief justification seems re- 
quired. 

For better or (more likely) for worse, 
the MMPI represents a fived item pool. 
Examination of the  interrelationships 
among many characteristics of this item 
pool led Wiggins and Goldberg (1965) to 
conclude: 


Over- and under-representation of certain classes 
of desirability, endorsement, ambiguity, and gram- 
matical characteristics tends to make the item 
pool unnecessarily homogeneous and may, in part, 
contribute to rather severe restrictions in criterion 


group discriminations. The fortuitous confounding 

of such item characteristics with substantive di- 

mensions (Block, 1965; Wiggins, 1962) has created 

interpretative problems (Edwards, 1957; Jackson 

& Messick, 1961) which may never be satisfac- 

es resolved within any fixed item pool [p. 394- 
5]. 


Although these authors stress the impor- 
tance of basie research in item develop- 
ment, such research will not be of im- 
mediate value to the practical consumer of 
the MMPI. The present attempt to de- 
velop substantive scales for the MMPI was 
not initiated with the hope of overcoming 
the built-in shortcomings of the item pool. 
However, it was predicated on the as- 
sumption that the interaction of item 
characteristics and stylistic tendencies 
with substantive dimensions might be bet- 
ter understood than the interaction of 
such sources of variance with the complex 
and poorly understood dimensions yielded 
by the strategy of contrasted groups (e.g., 
"hysteria"). 

The method whereby the original MMPI 
content categories were revised involved 
the collapsing of several categories into 
single categories, reassignment of items 
from one eategory to another, elimination 
of original categories, creation of new ones 
and rekeying of item options within cate- 
gories. Procedures were, with one minor ex- 
ception, completely intuitive, and no claim 
is made for their replicability.* 

The major item regroupings involved 
physieal symptoms, interests, and items 
reflecting manifest hostility. Items from 
General Health, Cardiorespiratory, Gas- 
trointestinal, and Genitourinary were com- 
bined in the single revised category of 
*Poor Health." Items from General Neuro- 
logie, Cranial Nerves, Motility and Coor- 
dination, and Sensibility were combined 
into a single category of “Organic Symp- 
toms.” Items reflecting hostility from the 
Sadistic-Masochistic category formed the 
nucleus of a new category of “Manifest 
Hostility” to which 21 items from 7 other 
original categories were added. A small 


‘The assistance of Victor R. Lovell in perform- 
ing these item regroupings is gratefully acknowl- 
edged. F 
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TABLE 5 
CORRECTED Opp-Even INTERNAL CONSISTENCY 
COEFFICIENTS OF REVISED CONTENT 
CATEGORIES IN Two COLLEGE 


POPULATIONS 
ean Stanford Stanford „Oregon, 
mee omi Gre) quas 

Religious 15 .87 .86 .81 
Social 56 .84 .83 .80 
Depression 33 .83 .82 .78 
Morale 40 .84 79 74 
Authority 43 bf 71 .80 
Phobias 27 02 .80 .75 
Hostility 27 75 .78 -75 
Organic 36 m .79 .72 
Psychot- 

icism 48 75 72 T 
Family 27 74 .67 43 
Hypomania 25 72 E -66 
Health 28 -76 -70 .52 
Feminine 56 55 .60 .82 
Sleep 15 .56 .58 .52 
Obsessive 27 .52 .55 .50 
Addietion 6 .67 .53 .42 
Lie 15 55 65 55 
Vasomotor 10 .62 .60 -58 
Sexual 16 .84 .51 .53 


group of items from the Habits category 
were considered separately as an “Addic- 
tion" category. 

The Occupational Attitudes category 
was judged too heterogeneous, and items 
from this category were regrouped under 
"Obsessive-Compulsive," “Poor Morale,” 
and four other revised categories. Original 
categories that were retained were purified 
around a central theme and items elimi- 
nated or borrowed from other categories in 
light of this theme. The category of Hab- 
its, for example, was redefined as “Sleep- 
ing Habits” which eliminated seven of the 
original items and added three from other 
categories. 

The categories of Educational Attitudes 
and Masculinity-Femininity were placed in 
a common pool and from this pool pre- 
liminary attempts were made to differen- 
tiate feminine interest patterns from ten- 
dencies toward sexual inversion. When 
this differentiation was judged unsuccess- 
ful, a general category of “Feminine Inter- 
ests” was developed which proved to be 


ambiguous with respect to keying direc- 
tion. In the absence of a clear-cut ration- 
ale, the empirical norms of Drake (1953) 
were used as a basis for item keying. Items 
in the “Feminine Interest” category which 
significantly differentiated men and women 
in Drake’s sample were retained and keyed 
in the female direction. 


Internal Consistency of Revised Categories 


A more substantively consistent arrange- 
ment of items into content categories 
should be reflected in increased internal 
consistencies in the revised set. Total 
scores on odd and even items were com- 
puted for each of the 18 revised categories 
in samples of 250 men and 250 women 
students from Stanford University. Since 
these samples had, in part, inspired the re- 
classification, odd and even totals were 
also computed in a mixed group of 203 
men and women introductory psychology 
students from the University of Oregon.’ 
Table 5 presents Spearman-Brown cor- 
rected internal consistency coefficients for 
the Stanford and Oregon samples. Since 
new content categories were created and 
old ones considerably altered in the re- 
vision of the content categories, the suc- 
cess of the revision procedures cannot, in 
all cases, be directly assessed by compari- 
son of each category with its revised 
counterpart. A slight decline in internal con- 
sistency occurred in depression, obsessive- 
compulsive and vasomotor categories. This 
is more than offset by the increases in in- 
ternal consistency which occurred in 14 
categories which can be compared with 
their original counterparts. The most dra- 
matie increase occurred in the category of 
religious attitudes where deletion of four 
items and rekeying of those remaining re- 
sulted in internal consistency increases 
from the low 20s to the high 80s. Re- 
grouping sadistic-masochistic items into 
the more general eategory of "Manifest 
Hostility” resulted in increases from the 
low .30s to the middle .70s. Other in- 
creases may be noted by comparing Ta- 
ble 5 with Table 2. 


"These data for 95 men and 108 women were 
made available by Lewis R. Goldberg. 
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CONSTRUCTION OF FINAL 
CONTENT SCALES 


On the basis of the data presented in 
Table 5, it was decided that there were 
15 substantive dimensions in the MMPI 
pool which possessed promising internal 
consistencies and sufficient numbers of 
items to warrant further exploration. These 
15 dimensions appear as the first 15 cate- 
gories in Table 5. The categories of Ad- 
diction, Lie, Vasomotor, and Sexual were 
dropped from further consideration at 
this point. The categories of Feminine In- 
terests, Sleeping Habits, and Obsessive- 
Compulsive were carried along on a very 
tentative basis. 

The Stanford sample. was randomly di- 
vided into two groups of 300 and 200 sub- 
jects, with an equal number of men and 
women within each group. The group 
of 300 subjects served as an item analysis 
group for scale purification and the group 
of 200 subjects was used for an independent 
assessment of the homogeniety of scales 
formed by item analysis. 

Point biserial correlations were com- 
puted between the 566 items of the 
MMPI® and each of the 15 total scale 
scores of the revised content categories. 
An item was retained in a given content 
scale if: (a) its point biserial correlation 
with the total scale of the category of 
which it was a member exceeded .30 and 
(b) if its correlation with the total scale 
of the category of which it was a member 
exceeded its correlation with all 14 re- 
maining revised content category scores. 

Table 6 shows the number of items elim- 
inated by each of the two criteria of item 
analysis. Among the Social Maladjustment 
items, for example, 26 items were elimi- 
nated because their correlation with the 
Social total scale score was less than .30. 
Three additional items were eliminated 
because their item-total correlations, al- 
though greater than .30, were equaled or 
exceeded by item-total correlations with one 


"Sixteen items are repeated in the group form 
of the MMPI. In all analyses reported here, only 
the first appearance of a repeated item was con- 
sidered. 


TABLE 6 
NUMBER or IrEMS ELIMINATED BY ITEM ANALYSIS 
or REVISED CONTENT CATEGORIES 


Scale Original < 30  Nonnde Final n 
Religious 15 3 0 12 
Social 56 26 3 27 
Depression 33 12 2 20° 
Morale 40 15 3 238 
Authority 43 21 2 20 
Phobias 27 7 | 19 
Hostility 27 7 1 20^ 
Organic 36 13 0 23 
Psychot- 

icism 48 31 4 13 
Family 27 11 0 16 
Hypomanie 25 5 0 20 
Health 28 15 0 13 
Feminine 56 26 0 30 
Sleep 15 3 1 11 
Obsessive 27 14 3 10 


a Includes one additional item from another 
content category. 


or more of the 14 additional content 
categories. 

It can be seen from Table 6 that item 
selection was made primarily on the basis 
of internal consistency. Only 20 items 
were eliminated on the basis of their being 
correlated with categories other than their 
own. Note, however, that the criterion of 
scale independence employed was quite 
minimal. Three items were judged to have 
been initially misclassified after examina- 
tion of their correlations with other cate- 
gories. Thus, one “Obsessive” item was 
transferred to the Depression category, one 
“Social” item to the Morale category, and 
one “Psychoticism” item to the Hostility 
category. 

The 15 revised content categories and 
the 15 content scales formed by item anal- 
ysis were then scored in the group of 200 
subjects originally set aside for this pur- 
pose. As a more general measure of scale 
homogeneity, Cronbach's coefficient alpha 
(Cronbach, 1951) was computed for the 15 
categories and 15 scales. Content scales 
were judged improved by item analysis if 
their alpha coefficient increased despite the 
elimination of substantial proportions of 
items. 

Table 7 presents alpha coefficients for 
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TABLE 7 
COEFFICIENT ALPHA INTERNAL CONSISTENCY 
ESTIMATES FOR REVISED CATEGORIES AND 
IrEM-ANALYZED CONTENT SCALES 
(N = 200 men and women) 


Re- 
Category Ec n Final n ed 

final 
Religious 81 (15) 83 (12) 98 
Social 83 (56) 86 (27) 95 
Depression 84 (33) 82 (20) 96 
Morale 81 (40) 84 (23) 93 
Authority 77 (43) 78 (20) 92 
Phobias 70 (07 67 (19) 96 
Hostility 72 (27) 69 (20 97 
Organic 76 (86) 70 (23) 96 
Psychoticism 76 48) 61 (13) 85 
Family 72 (27) 72 (16) 94 
Hypomanie 69 (25) 67 (20) 97 
Health 69 (28) 59 (13) 90 
Feminine 77; (56) 84 (30 96 
Sleep 56 (158) 56 (D 9 
Obsessive 56 (27) 57 (10) 82 


the revised categories and the scales 
formed by item analysis. The contami- 
nated correlation between the two sets of 
measures is presented in the final column. 
The Religion, Social, Morale, Authority, 
Family, and Feminine Interests scales 
were judged to be improved by item 
analysis. Scale purification was extreme in 
several instances and resulted in improved 
alphas despite elimination of almost half 
the items in the scale. 

Increased homogeneity was not achieved 
by item analysis for Depression, Phobias, 
Hostility, Organic, Psychoticism, Hypoma- 
nia, or Health. Subsequent attempts to im- 
prove these scales by less stringent item 
analytic criteria were not successful. It was 
decided, therefore, to retain these scales in 
their revised form. The Sleeping Habits and 
Obsessive scales were abandoned at this 
point on the grounds of unpromising homo- 
geneity. 

The foregoing procedures resulted in the 
adoption of 13 mutually exclusive scales 
which were considered to be internally 
consistent, moderately independent, and 
representative of the major substantive 
clusters of the MMPI. All of these scales 
were based on rational regroupings of the 
original content categories proposed by 
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Hathaway and McKinley. Six of these 
scales were further refined by item- 
analytic procedures. This final set of 13 
seales will be referred to as the MMPI 
content scales.” The content of the items 
in the scales is described in Table 8. 


Internal Consistency of Content Scales in 
Normal Populations 


Since virtually all of the preliminary in- 
vestigation and development of the MMPI 
content scales was based on a single col- 
lege population, it was necessary to gather 
additional data from other populations to 
assess the psychometric characteristics of 
the final scales. Accordingly, complete 
MMPI protocols were obtained from the 
samples listed in Table 9. A group of Air 
Force enlisted men served as a noncollege 
normal population while the remaining 
samples were college students of both sexes 
from several geographical regions.® 

The internal consistency of the MMPI 
content scales was assessed by computing 
alpha coefficients in samples not involved 
in seale derivation. These data are pre- 
sented in Table 10. Reliability coeffi- 
cients from the college samples are, with 
one notable exception, generally in accord 
with expectations gained from the deriva- 
tion samples. The exception is Feminine 
Interests, which, although among the most 
internally consistent scales in the deriva- 
tion sample, is the least reliable scale in 
other college and Air Force samples. More 
in line with expectations are the generally 
high internal consistencies of Social Mal- 
adjustment, Religious Fundamentalism, 
Depression, and Poor Morale in the college 
groups. As before, Hypomania and Poor 
Health are among the lowest in internal 
consistency, but the obtained alpha coeffi- 
cients are quite respectable in comparison 
with the majority of MMPI scales in use 
today. 

With the exception of Feminine In- 
terests, the alpha coefficients obtained in 


“Item lists are provided in Appendix A. 

*'The author is grateful to John D. Hundleby 
and to Leonard G. Rorer for making available 
the Air Force and Minnesota college data, respec- 
tively. 
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TABLE 8 
Description or MMPI Content Scares 


SOC Social Maladjustment: High SOC is socially bashful, shy, embarrassed, reticent, self-conscious 
and extremely reserved. Low SOC is gregarious, confident, assertive, and relates quickly and easily to 
others. He is fun loving, the life of a party, a joiner who experiences no difficulty in speaking before a 
group. This scale would correspond roughly with the popular concept of ‘‘ntroversion-extraversion.”” 

DEP Depression: High DEP experiences guilt, regret, worry, unhappiness and a feeling that life 
has lost its zest. He experiences difficulty in concentrating and has little motivation to pursue things. 
His self-esteem is low, and he is anxious and apprehensive about the future. He is sensitive to slight, 
feels misunderstood, and is convinced that he is unworthy and deserves punishment. In short he is 
classically depressed. 

FEM Feminine Interests: High FEM admits to liking feminine games, hobbies, and vocations. He 
denies liking masculine games, hobbies, and vocations. Here there is almost complete contamination of 
content and form which has been noted in other contexts by several writers. Individuals may score 
high on this seale by presenting themselves as liking many things since this item stem is present in al- 
most all items. They may also score high by endorsing interests, which, although possibly feminine, are 
also socially desirable such as an interest in poetry, dramatics, news of the theatre, and artistic pursuits. 
This has been noted in the case of Wiggins’ Sd scale. Finally, of course, individuals with a geniune 
preference for activities which are conceived by our culture as “feminine” will achieve high scores on 
this scale. 

MOR Poor Morale: High MOR is lacking in self-confidence, feels that he has failed in life and is 
given to despair and a tendency to give up hope. He is extremely sensitive to the feelings and reactions 
of others and feels misunderstood by them while at the same time being concerned about offending them. 
He feels useless and is socially suggestible. There is a substantive overlap here between the Depression 
and Social Maladjustment scales and the Poor Morale scale. The Social Maladjustment scale seems to 
emphasize a lack of social ascendance and poise, the Depression scale feelings of guilt and apprehension, 
while the present scale seems to emphasize a lack of self-confidence and hypersensitivity to the opinions 
of others. 

REL Religious Fundamentalism: High scorers on this scale see themselves as religious, church- 
going people who accept as true a number of fundamentalist religious convictions, They also tend to 
view their faith as the true one. 

AUT Authority Conflict: High AUT sees life as a jungle and is convinced that others are unscrupu- 
lous, dishonest, hypocritical, and motivated only by personal profit. He distrusts others, has little 
respect for experts, is competitive and believes that everyone should get away with whatever they can. 

PSY Psychoticism: High PSY admits to a number of classic psychotic symptoms of a primarily 
paranoid nature. He admits to hallucinations, strange experiences, loss of control, and classic paranoid 
delusions of grandeur and persecution. He admits to feelings of unreality, daydreaming, and a sense 
that things are wrong, while feeling misunderstood by others. are i 

ORG Organic Symptoms: High ORG admits to symptoms which are often indicative of organic 
involvement. These include headaches, nausea, dizziness, loss of motility and coordination, loss of 
consciousness, poor concentration and memory, speaking and reading difficulty, muscular control, skin 
sensations, hearing and smell. A y 

FAM Family Problems: High FAM feels that he had an unpleasant home life characterized by a 
lack of love in the family and parents who were unnecessarily critical, nervous, quarrelsome, and quick 
tempered. Although some items are ambiguous most are phrased with reference to the parental home 
rather than the individual’s current home. Wiha 

HOS Manifest Hostility: High HOS admits to sadistic impulses and a tendency to be cross, grouchy, 
competitive, argumentative, uncooperative, and retaliatory in his interpersonal relationships. He is 
often competitive and socially aggressive. $ 1 

PHO Phobias: High PHO has admitted to a number of fears, many of them of the classically phobic 
variety such as heights, dark, closed spaces, etc. À 3 L 

HYP Hypomania: High HYP is characterized by feelings of excitement, well being, restlessness, 
and tension. He is enthusiastic, high strung, cheerful, full of energy, and apt to be hotheaded. He has 
broad interests, seeks change, and is apt to take on more than he can handle. 4 

HEA Poor Health: High HEA is concerned about his health and has admitted to a variety of gastro- 
intestinal complaints centering around an upset stomach and difficulty in elimination. 


the Air Force sample are substantial, in- in an Air Force, as opposed to college, 
dicating a generality beyond college popu- population may be noted. Whereas Psy- 
lations. Several differences in the relative — choticism and Organic Symptoms are only 
internal consistencies of the content scales moderately reliable in college groups, 
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TABLE 9 

Composition or NORMAL SAMPLE 
(N = 1,368) 
Group Men Women 
Air Force enlisted men* 261 = 
Stanford University 250 250 
University of Minnesota 96 125 
University of Oregon 95 108 
University of Illinois 100 .88 
802 506 


* Chanute Air Force Base, Rantoul, Illinois. 


they are among the most internally con- 
sistent scales in the Air Force sample. 
This may reflect, in part, the greater 
heterogeneity of the Air Force sample. It 
is also of interest to note that whereas 
Religious Fundamentalism is consistently 
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differences among certain groups, and, ini- 
tially, substantive interpretations of clini- 
cal scales were rather narrowly restricted 
to such differences, whatever they may 
imply. By contrast, the MMPI content 
scales were designed to reflect reliable in- 
dividual differences along interpretable 
substantive dimensions and group differ- 
ences, where found, will serve to enhance 
rather than define the meaning of the con- 
tent scale involved. 

The cooperation of two quite different 
psychiatrie installations was obtained in 
securing complete MMPI protocols of pa- 
tients on whom a final psychiatric diagno- 
sis had been made.? One installation was a 
large state mental hospital whose inmates 
represent a wide spectrum of psychopa- 
thology, the most frequent diagnosis be- 
ing that of chronic schizophrenia. The sec- 


TABLE 10 


COEFFICIENT ALPHA INTERNAL Consistency Estimates ror MMPI CoNTENT SCALES 
IN SEVEN NORMAL SAMPLES 


University of University of 


niversity, of University of University of University of 


AB Eds Minnesota Minnesota regon Oregon Illinois Illinois 

men women men women men women 

(N = 261) (N = 96) (N = 125) (N = 95) (N = 108) (N = 100) (N = 83) 
S0C .829 .856 .835 .830. .862 .856 .848 
DEP 872 .860 .831 .821 756 .842 .854 
FEM -585 .523 .505 .594 .566 .650 .542 
MOR .857 .866 .825 .804 153 .867 .804 
REL .674 .892 .861 .842 .756 .817 .193 
AUT .681 .794 4412 .743 .669 -766 698 
PSY 877 .794 .687 -738 -662 763 .806 
ORG .863 712 .645 .652 .695 «749 781 
FAM «707 .712 .789 .712 .694 .806 -643 
HOS 764 .819 .794 .788 .651 716 .T65 
PHO 765 .663 .721 -568 -701 .705 -770 
HYP 671 -701 715 -682 .632 .679 .607 
HEA .743 .557 -713 .555 5387 -673 .651 


among the most reliable scales for college 
groups, it is one of the least reliable scales 
in the Air Force sample. 


Group Differences in Content Scale Scores 


Personality inventory scale scores which 
presumably reflect individual differences 
along dimensions of substantive interest 
should, at the very least, be expected to 
reflect such differences when diverse 
groups are compared. The standard MMPI 
clinical scales were constructed to reflect 


ond installation was an outpatient clinic 
attached to an Air Force base whose 
clientele consists primarily of neurotic, 
sociopathic, and personality disorders. At 
each installation, an attempt was made 
to obtain the majority of recent and com- 
plete MMPI protocols on patients whose 
files were sufficiently complete to allow de- 


° The author is indebted to Paul Finkel, Clifford 
M. Broadway, and other staff of Kankakee State 
Hospital for their assistance in providing protocols 
and case folders. 
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TABLE 11 
COMPOSITION or PSYCHIATRIC SAMPLE 

(N = 614) 

ANS dudas Inpatients* Out patients” 
Men Women Men Women 

000-199 Brain disorders 23 16 16 + 
200-213 Affective psychoses 20 27 — — 
220-229 Schizophrenic psychoses 85 83 — 4 
400-406 — Psychoneurotie disorders 13 23 15 7 
500-504 ^ Personality pattern disturbance 15 5 17 2 
510-513 Personality trait disturbance 17 8 36 6 
520-524 Sociopathic personality disturbance 46 14 19 1 
530-535 Special symptom reaction — a 6 = 
540-546 Transient situational disturbance c = 8 3 
Other* 53 16 8 2 
Total 272 192 125 25 


* Kankakee State Hospital, Kankakee, Illinois. 


> Chanute Air Force Base Outpatient Clinic, Rantoul, Illinois. 


* Rare category or indeterminant diagnosis. 


termination of the final psychiatric diagno- 
sis. On the basis of information con- 
tained in the case folder, each patient was 
classified in terms of the first three digits of 
the diagnostic code given in the American 
Psychiatrie Association's (1952) Diagnos- 
tic and statistical manual: Mental disor- 
ders. In the inpatient sample, several pre- 
liminary diagnostic impressions were 
available in addition to the final, official 
hospital diagnosis made by the diagnostic 
staff. Where there was great discrepancy 
between preliminary and final diagnoses 
or where the final diagnosis was lacking in 
precision, the case was classified as “inde- 
terminate.” In the outpatient sample, only 
the final decision of the diagnostic staff 
was employed, and when this was im- 
precise, an “indeterminant” classification 
was assigned. The distribution of such 
classifications is given in Table 11 for 
both inpatient and outpatient samples. 
These distributions represent available 
records rather than any attempted sam- 
pling procedure. They are judged to be 
reasonably representative of the two kinds 
of installations involved. 

Although the MMPI is given, more or 
less, routinely at both of these installations, 
its contribution to final psychiatric diag- 
nosis is probably less than at other in- 
stallations that routinely give the MMPI. 


It should be recognized, nevertheless, that 
an unknown degree of criterion contami- 
nation exists. However, in no instance 
were MMPI content scale scores available 
to the institution prior to final diagnosis. 
From the samples listed in Tables 9 
and 11, it was possible to form seven 
fairly large groups which differed mark- 
edly among themselves in such character- 
istics as age, sex, education, and psychiat- 
rie status. These groups were: (a) AFM 
(261 Air Force men); (b) IPM (272 in- 
patient men); (c) IPW (192 inpatient 
women); (d) OPM (125 outpatient men) ; 
(e) OPW (25 outpatient women); (f) CM 
(96 University of Minnesota college 
men); and (g) CW (125 University of 
Minnesota college women). The 13 
MMPI content scales were scored in each 
of these seven groups. For each content 
scale, a simple analysis of variance was 
computed to test the null hypothesis that 
mean scale scores are the same for the 
populations from which the seven groups 
are derived. This hypothesis was rejected 
for all 13 scales at p < .01 by the F ratio 
with 6 and 1089 degrees of freedom. Differ- 
ences between certain group means in con- 
tent scales were further assessed by £ tests 
for independent groups. Of 21 possible 
group comparisons, 13 were judged sensi- 
ble, and a highly condensed summary of 
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TABLE 12 
CONFIDENCE LEVELS ror ( Tests or GROUP MEAN DIFFERENCES IN CONTENT SCALE Scores 
SOC | DEP | FEM | MOR | REL | AUT | PSY | ORG | FAM | HOS | PHO | HYP | HEA 

AFM versus 

OPM .05 | .01 .01 .01 | .001 .01 
OPM versus 

OPW -05 | .001 | .001 | .001 .05 | .05 | .001 .001 | .001 -001 
OPW versus 

CW .001 | .001 .001 .01 -001 | .001 | .001 | .001 | .001 | .05 | .001 
CW versus 

IPW -001 | .001 | .05 | .001 | .05 | .001 | .001 | .001 | .001 | .001 | .001 .001 
OPM versus 

CM .01 | .001 .001 .001 | .001 | .001 | .01 .001 | .05 | .001 
IPW versus 

OPW 01 -001 .001 .001 .05 | .001 -001 
OPM versus 

IPM .05 |.01 -001 | .01 .05 .01 .05 
OM versus 

AFM .05 | .001 -001 | .001 | .001 | .001 | .001 | .05 | .001 | .001 | .001 | .001 
CM versus 

IPM .001 | .001 | .001 -001 | .001 | .001 | .01 -001 | .001 | .001 
AFM versus 

IPM .001 .01 .001 .05 
OM versus 

CW .001 | .01 .05 | .001 .001 | .001 
F ratio 393.14) 23.18/255.76| 20.43| 8.20 | 33.21| 17.95| 34.71/436.46| 14.31| 32.04| 2.94 | 25.27 


these 13 group comparisons is presented 
in Table 12. 

The last row of Table 12 contains the 
F ratios for each content scale based on 
all seven groups. The remaining rows con- 
tain the significance levels at which the 
hypothesis of no difference in underlying 
population means is rejected using £ tests 
for independent means. In the first row, 
for example, means on the 13 content 
scales were compared for the Air Force 
men and the outpatient men groups. The 
hypothesis of no difference in group 
means was rejected at p « .05 for Social 
Maladjustment, at p < .01 for Depression, 
and at p « .001 for Religious Fundamen- 
talism, Significant mean differences on the 
Authority Conflict scale were not found 
between Air Force men and outpatient 
men. This type of summary does not pro- 
vide information on the extent of mean dif- 
ferences nor even their direction. Also, with 
this many comparisons it is to be expected 
that at least several will not be replicable. 
The point of the present analysis is not to 
attach significance to any single compari- 


son but rather to provide an overview of 
the content scales which differ most from 
sample to sample and of the sample com- 
parisons which yield the greatest differ- 
ences. 

The content scales which contributed 
most to differences among this particular 
sample of diverse groups were Poor Morale, 
Organie Symptoms, Phobias, and De- 
pression. Lesser, but not insubstantial, dif- 
ferences occurred with Poor Health, 
Manifest Hostility, Feminine Interests, Au- 
thority Conflict, and Psychoticism. Con- 
tent scales whose means did not differ 
greatly among the present samples are 
Social Maladjustment, Religious Funda- 
mentalism, Family Problems, and Hypo- 
mania. 

As might be expected, mean content scale 
scores differ most when college groups are 
compared with same-sex patient groups, 
both inpatient and outpatient. What might 
not be anticipated are the substantial dif- 
ferences between college and Air Force 
men. The differences between outpatient 
men and women probably reflect the un- 
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TABLE 13 
CONFIDENCE LEVELS FOR ¢ Tests OF GROUP MEAN DIFFERENCES IN CLINICAL ScALE SCORES 
L F K | Hs | D | Hy | Pa | Mf | Pa | Pt | S | Ma | Si 
AFM versus 
OPM .001 | .001 | .001 | .001 A 
OPM versus ats RA 
OPW .05 | .01 | .001 | .001 | .001 .001 | .01 | .001 | .01 .001 
OPW versus 
Cw .001 | .001 | .001 | .001 | .001 | .001 | .001 | .05 | .001 | .001 | .001 | .O1 .001 
CW versus 
IPW .001 | .001 | .001 | .001 | .001 -001 | .001 | .001 | .001 | .001 | .001 | .001 
OPM versus 
CM .001 | .001 | .001 | .001 | .001 | .001 | .001 .001 | .001 | .001 | .01 .001 
IPW versus 
OPW .01 | .01 | .001 | .001 | .001 | .01 .01 | .001 | .001 .001 
OPM versus 
IPM .001 | .001 | .001 .001 | 001 | .01 .01 
CM versus 
AFM .001 | .001 | .001 | .001 | .001 .001 | .01 .01 | .001 | .001 | .001 | .001 
CM versus 
IPM .001 | .001 | .05 | .001 | .001 .001 | .05 | .001 | .01 .001 .001 
AFM versus 
IPM .05 .001 .001 .05 | .001 
CM versus 
CW .05 | .01 .001 | .001 | .001 .001 
F ratio 21.62) 16.56} 8.81 | 35.55| 40.61| 27.20| 31.99/200.71| 10.23| 20.71| 21.50 9.69 | 20.5 
representative nature of a female sample scales were specifically constructed for the 
at an Air Force installation. The Air Force purpose of discriminating normal from ab- 
“normal” sample differs very slightly from normal samples, the slight edge they pos- 
sess over the content scales in the present 


both inpatient and outpatient male sam- 
ples. Differences between outpatient and 
inpatient men are also slight. 

To provide a context of comparison for 
group differences obtained with the con- 
tent scales, the same analysis was per- 
formed using the 13 standard MMPI 
clinical scales. A summary of this analysis 
is presented in Table 13. The number 
of significant differences obtained with the 
clinical scales is similar to that obtained 
with the content scales. Again, the greatest 
mean scale score differences occur when 
college groups are contrasted with patient 
groups. Relatively few differences are 
found between Air Force and patient sam- 
ples or between inpatient and outpatient 
males. Although the number of significant 
differences is similar, the clinical scales in 
general tend to allow for rejection of the 
null hypothesis at a slightly higher signifi- 
cance level than is possible with the con- 
tent scales. Considering that the clinical 


analysis is not an impressive one. What- 
ever is represented in mean scale score 
comparisons across diverse groups is 
clearly present in the MMPI content 
scales as well. 


Ordering of Group Means on Content 
Scales 


In addition to assessing the overall con- 
tribution of content scale scores to group 
differences, it is of interest to examine the 
ordering of means for diverse groups 
within each of the separate content scales. 
Such a procedure is useful in suggesting 
underlying psychological continua which 
may be associated with scale scores 
(Gough, 1960). In the present instance 
this approach should not be considered 
validational as the content scales were not 
devised to serve specific group discrim- 
inative purposes. It is assumed that the di- 
mensions underlying the content scales 
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are substantive and although primarily 
pathological in nature not necessarily 
equivalent to the dimensions which con- 
tribute to the fact of membership in a so- 
cially or psychiatrically defined group. 

From the male samples described in Ta- 
bles 9 and 11, 16 subgroups were selected 
to represent a broad range of socioeco- 
nomic, educational, and psychiatrie vari- 
ables. Content scale means and standard 
deviations for each subgroup are pro- 
vided separately for the 13 content scales 
in Appendix B (Tables B1-B13). For 
each content scale, the groups have been 
ordered by mean seale score. In addition 
to providing suggestions of underlying 
dimensions, the data in Appendix B pro- 
vide preliminary and admittedly inade- 
quate normative information for male sam- 
ples. 

Even a eursory inspection of the tables 
in Appendix B reveals that the con- 
tent seales, as a group, do not provide a 
measure of “pathology” that is consistent 
with the conventional psychiatric meaning 
of this term. Space limitations do not per- 
mit a detailed scale-by-scale analysis, but 
certain generalizations may be stated. 
When the rank order of each group within 
each seale is pooled across the 13 scales, 
four reasonably consistent groupings may 
be distinguished. In the first group are the 
brain disorders, the outpatient personality 
pattern and trait disturbances, and the Air 
Force normals—all of whom tend to be 
among the highest scorers on the content 
scales. Next in order are the inpatient 
and outpatient sociopathic disturbances, 
the affective psychoses, and the special 
symptom outpatient group. Following this 
group are the inpatient and outpatient 
neurotic disorders, the schizophrenic psy- 
choses, and the inpatient personality 
trait disturbances. The lowest scoring 
group tends to consist of the inpatient 
personality trait disturbances, the outpa- 
tient brain disorders, college students, and 
outpatient transient situational disturb- 
ances. The foregoing generalizations tend 
to obscure large individual differences 
among content scales. The pattern of such 
differences can only be discerned by de- 


Wicarns 


tailed examination of Tables B1 through 
B13. 


DIFFERENTIAL DIAGNOSIS OF 
PSYCHIATRIC INPATIENTS 


The preceding consideration of scale 
mean distributions across a variety of 
samples was designed to explicate the mean- 
ing of MMPI scales formed from substan- 
tive rather than group discriminative con- 
siderations. Should such scales be applied 
to problems of psychiatric classification, 
it is hoped that the approach would be 
multivariate rather than single scale. Fur- 
ther, the problem typically facing the 
practicing diagnostician is not that of dis- 
tinguishing disparate groups such as Air 
Force personnel and college students, but 
rather that of distinguishing putative sub- 
groupings within a single population. 

As an example of the use of MMPI con- 
tent scales in a realistic diagnostic problem, 
multiple discriminant analytic procedures 
were applied to the classification of psy- 
chiatric inpatients. From Table 11 it can be 
seen that by combining the diagnostic cate- 
gories of personality pattern and personality 
trait disturbance into the single category of 
personality disturbance (men = 32; women 
= 13) and by eliminating the rare and in- 
determinate categories (men = 53; women 
= 16), six major diagnostic groupings can 
be formed for men (n = 219) and women 
(n = 176), respectively. These groupings 
are: (a) brain disorders, (b) affective psy- 
choses, (c) schizophrenic psychoses, (d) 
psychoneurotic disorders, (e) personality 
disorders, and (f) sociopathic disorders. 

Using the 13 MMPI content scales as 
predictors, multiple discriminant analyses 
were performed separately on these six 
groups of men and six groups of women in- 
patients. The main purpose of this analysis 
was to test the generalized, multivariate 
null hypothesis that these six diagnostic 
groups have similar content scale scores. 
Should rejection of this hypothesis seem 
tenable, the contribution of the separate 
content scales to the main discriminant 
functions would shed light on their relative 
diagnostic importance. Evaluation of the 
replicability of the functions derived and 
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TABLE 14 


DISCRIMINANT ÁNALYsIS OF SIX INPATIENT DraAGNOSTIC GROUPS BASED 
on 13 MMPI Content SCALES 


(N = 219 men) 
Scaled vectors 
Scales 
E I III IV v 
REL — .058 — .046 — .003 — .098 082 
SOC — .058 -047 +239 — .032 —.105 
DEP -060 —.727 —.146 114 .056 
MOR — .229 .300 040 —.182 .508 
AUT —.402 .016 —.210 -196 — .080 
PHO — .054 +220 — .053 .291 — .004 
HOS .518 .054 — .002 —.842 —.213 
ORG —.187 —.293 — .087 — .040 — B54 
PSY —.134 -200 573 -040 — .037 
FAM .117 -006 016 .260 091 
HYP —.091 -107 —.465 —.140 —.232 
HEA 145 .132 091 — .206 306 
FEM — .004 -037 287 -007 — .067 
Latent roots Trace (percent) 

Xl = .2089 40.62 

A2 = .1522 29.60 

A3 = .0894 17.38 

M = .0525 10.20 

A5 — .0113 2.19 


Trace = .5142; A = .6192; Fi, = 1.56, p < 01. 


their efficiency in classifying other samples 
of psychiatric patients must await further 
data collection. The method of analysis and 
presentation of findings is that of Cooley 
and Lohnes (1962, pp. 116-133). 

Tables 14 and 15 present a summary of 
findings from the discriminant analysis of 
six groups based on 13 content scales for 
the 219 men and 176 women, respectively. 
Where the number of groups is less than 
the number of predictors, the maximum 
number of discriminants is one less than the 
number of groups or, in the present in- 
stance, five. The five latent roots are pre- 
sented along with their associated vectors 
(coefficients) which have been adjusted to 
permit comparison of their relative contri- 
bution to the discriminant function. The 
generalized null hypothesis was evaluated 
by Wilks’ lambda which expresses the ratio 
of pooled within groups cross-product de- 
viation scores to total sample cross-product 
deviation scores. In testing the significance 
of lambda, the F approximation of Rao was 
employed (Cooley & Lohnes, 1962, pp. 61- 


62). In the male sample, F = 1.56, which for 
65 and 954 df is significant at p « .01. For 
the women F = 1.53, which for 65 and 750 
df is also significant at p « .01. This makes 
tenable the rejection of the hypothesis of 
the equality of mean vectors for the six 
groups. 

The coefficients associated with each of 
the five discriminant functions are of in- 
terest in assessing the relative contributions 
of the content scales to classification of an 
inpatient population. From a practical 
standpoint, it should be noted that three 
diseriminant functions are probably suffi- 
cient for both men and women as they ac- 
count for approximately 88% of the dis- 
eriminating power of the scales in each 
sample. It should also be noted that the 
three discriminant functions in each analy- 
sis are sufficiently different in pattern for 
men and women as to discourage pooling 
data for sexes in a hospital population. 

For men the largest contributors to group 
diserimination along the first diseriminant 
function are Hostility and Authority Con- 
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TABLE 15 
DISCRIMINANT ANALYSIS OF Six INPATIENT Diagnostic Groups BASED 
on 13 MMPI Convent Scares 


(N = 176 women) 


Scaled vectors 


FT Ir IH IV V 

REL / -018 -082 .044 —.022 .042 
SOC .090 —.108 — .056 .097 .165 
DEP — .056 —.187 —.325 — .527 —.137 
MOR 377 -299 .086 .189 211 
AUT -774 —.173 -381 — .090 —.129 
PHO —.159 .046 .010 —.126 .198 
HOS —.196 .021 — .304 .065 — .253 
ORG .257 .223 — .024 .186 —.316 
PSY — .406 — .055 -020 .120 .149 
FAM .104 — .230 —.104 .166 —.118 
HYP .062 —.005 .036 .008 .159 
HEA —.218 — .049 -167 .182 .224 
FEM .035 — 179, -015 .125 .082 

Latent roots Trace (percent) 

Al = .3758 55.54 

A2 — .1382 20.43 

A3 = .0834 12.33 

4 = .0488 7.21 

A5 = .0304 4.49 


Trace = .6765; A = .5455; Ffw = 1.53, p < .O1. 


flict. Inspection of group means for these 
scales (Tables B10 and B6 of Appendix B) 
indicates that they are relatively effective in 
separating sociopathic and brain-disorder 
groups from schizophrenic, neurotic, and 
personality disturbances. Depression and 
Poor Morale contribute to group discrimina- 
tion along the second discriminant. Mean 
Depression scale scores (Table B2) for 
brain disorder and neurotic groups are well 
above those for personality, affective and 
schizophrenic groups. Mean scores for Poor 
Morale (Table B4) reflect a separation be- 
tween the brain-disorder group and the 
others just mentioned. Psychoticism and 
Hypomania are the largest contributors to 
the third discriminant function. Mean PSY 
scores (Table B7) suggest a clear psychotic- 
neurotic distinction with brain, sociopathic, 
schizophrenic, and affective groups in the 
former category and neurotic and person- 
ality groups in the latter. Mean HYP scores 
suggest a similar dichotomy with the in- 
teresting exception of schizophrenics being 


classified toward the “neurotic” pole of the 
implied continuum. 

In the analysis based on women (Table 
15) the largest contributor to discrimina- 
tion along the first discriminant function is 
Authority Conflict. Inspection of group 
means for this scale indicates, as with the 
men, a separation of sociopathic (X = 
10.71) and brain disorder (9.50) groups 
from schizophrenics (8.96) neurotics (8.52) 
and personality groups (8.75, 8.20). The 
affective psychosis group attained the 
lowest mean score on this scale (7.63). 
The Psychoticism scale is the second largest 
contributor to the first discriminant func- 
tion. In women, high mean PSY scores 
are obtained for schizophrenies (13.78) and 
sociopaths (10.43), while lower mean scores 
characterize neuroties (9.57), brain disorder 
(8.44), affective (7.74), and personality 
groups (9.40, 7.13). Poor Morale contributes 
additionally to the first discriminant func- 
tion and has the largest coefficient on the 
second. High MOR scores were obtained 
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for personality pattern (12.60), neurotic 
(12.16), and affective psychotie (10.89) 
groups. Lower scores were obtained by per- 
sonality trait (9.88), sociopaths (9.07), and 
brain disorders (8.50). 

Depression and Hostility (which were 
significant contributors to the first and sec- 
ond discriminants for men) contribute, in 
addition to Authority Conflict, to the third 
diseriminant funetion for women. Mean 
DEP scale scores for women form a con- 
tinuum on which personality pattern 
(12.80), neurotic (11.39), and schizophrenic 
(10.78) groups are high and affective (9.89) 
and brain disorder (6.69) groups are low. 
On the Hostility scale, schizophrenic (9.73) 
and personality groups (8.60, 8.37) are high, 
while brain disorder (7.25) and affective 
psychotic (7.22) groups are relatively low. 

Any attempt to summarize the relative 
importance of the content scales in con- 
tributing to classification of psychiatric pa- 
tients by multiple discriminant analysis 
must be restricted in generalization to the 
present sample. In addition to possible 
problems of sample specificity, the present 
analysis was restricted to six diagnostic 
groupings which, although not arbitrary, 
may not be the groupings desired in other 
hospital settings. Nevertheless it seems im- 
portant to note that Authority Conflict, 
Poor Morale, Hostility, Psychoticism, and 
Depression were important contributors to 
group discrimination as were, to a lesser ex- 
tent, Family Problems, Organicity, and 
Hypomania. Seales which contributed rela- 
tively little to the present analysis were 
Religious Fundamentalism, Social Malad- 
justment, Phobias, Poor Health, and Femi- 
nine Interests. 


Discriminant Analysis Based on MMPI 
Clinical Scales 


Multiple discriminant analyses were also 
performed using the 13 standard MMPI 
clinical scales as predictors of the six diag- 
nostic groupings. Although not the primary 
concern of the present study, it was hoped 
that such analyses would provide a context 
of comparison for the analyses of content 
scales as well as insight into the utility of 


a multivariate approach to this most famil- 
iar diagnostic problem. Summaries of the 
results of these analyses for men and women 
are provided in Tables 16 and 17. 

As before, the significance of Wilks’ 
lambda was evaluated by the F approxima- 
tion of Rao. In the male sample, F = 1.62 
which for 65 and 954 df is significant at p 
< .01. For the women, F = 1.49 which for 
65 and 750 df allows rejection of the null 
hypothesis at p < .01. As with the content 
scales, the hypothesis of no difference be- 
tween mean vectors for the six groups can 
be confidently rejected. Again, from a prac- 
tical standpoint the dimensionality of the 
predictor space might be reduced to three 
since the first three discriminant functions 
account for 86% and 88% of the discrimi- 
nating power of the scales in male and fe- 
male samples, respectively. 

In the male sample, the predominant con- 
tribution of Sc and Pt to group discrimina- 
tion can be seen to be operative in all but 
the fourth discriminant function. Hy and K 
contribute additionally to the first discrimi- 
nant while Si and Pd are of importance to 
the second. The contribution of F to group 
diserimination is seen in the third and 
fourth diseriminant. Pa is involved in the 
last three discriminants and Hs contributes 
to the fourth. Clinical scales L, D, Mf, and 
Ma contributed relatively little to the pres- 
ent analysis. 

The first discriminant function in the fe- 
male sample is even more clearly dominated 
by Sc and Pt; the latter scale in this in- 
stance being the more heavily weighted. Hy 
contributed additionally to the second dis- 
criminant and with Pd to the third as well. 
The scales which contribute most to the 
present discriminant analysis are clearly 
Pt, Sc, Hy, and Pd. Lesser contributions 
come from F, Hs, D, Ma, and K. Scales L, 
Mf, and Si are of only minor importance. 


FACTORIAL STRUCTURE OF CONTENT SCALES 


Unlike the standard MMPI clinical 
scales, the MMPI content scales do not 
share common items and were constructed 
in such a way as to maximize the homogene- 
ity of each scale. Nevertheless, the criterion 
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TABLE 16 
DISCRIMINANT ANALYSIS OF Srx INPATIENT DiaGNosTIC Groups BASED 
on 13 MMPI Crrnicat ScALES 


(N = 219 men) 
Scaled vectors 
gus I n m w Y 

L —.059 -294 .126 ..089 —.104 
F —.011 —.079 —.677 — .612 —.137 
K —.470 —.251 060 — .046 zl 
Hs —.257 — .062 032 —.410 234 
D — .094 -257 — .235 .103 025 
Hy .516 — .025 —.375 .240 — .050 
Pd 271 — .342 084 —.176 — .247 
Mf —.169 — .075 .109 .123 181 
Pa .007 .036 —.304 .949 474 
Pt .465 387 406 .195 —.897 
Sc — .632 413 531 +264 — 406 
Ma .041 —.113 —.121 — .162 +268 
St —.001 — 415 042 — .290 +064 

Latent roots Trace (percent) 

ML = 39.48 

A2 = 29.99 

M = 16.70 

M = 9.46 

A = 4.37 


Trace = .5305; A = .6098; FSi, = 1.62, p < .01. 


employed for scale independence (in the 
correlational sense) during item analysis 
was quite minimal and the number of sepa- 
rate substantive dimensions involved in 
this set of scales is certainly less than 13. 
It is of interest, therefore, to examine the 
nature of the factor structure underlying 
the content scales and to do so with refer- 
ence to the manner in which this structure 
is manifest in diverse populations. The sam- 
ples selected for such analysis were: (a) 
261 Air Force enlisted men, (b) 258 male 
psychiatric inpatients, and (c) 100 Uni- 
versity of Illinois male students. Although 
sex (male) and geographical locale (Illi- 
nois) are common for these samples, they 
are assumed to vary on a large number of 
other demographic characteristics. 
Matrices of intercorrelations among the 
13 content scales were factored by the 
method of principal components, Factors 
were retained whose latent roots were 
greater than one. Three factors met this 
criterion in each of the samples.!? The re- 


"Although the employment of Guttman’s 
weaker, lower bound as a criterion for determin- 


tained factors accounted for 69%, 71%, and 
62% of the total scale variance in the Air 
Force, psychiatric, and college samples, re- 
spectively. The factor matrices were ro- 
tated to a varimax criterion. The rotated 
factor matrices for each of the three samples 
are presented in Table 18. 

Factor I, The first factor in the Air Force 
sample is a large (5396 of common vari- 
ance) and general dimension of self-re- 
ported maladjustment which is substan- 
tially loaded by all but three of the MMPI 
content scales. Organie Symptoms, Phobias, 
Poor Health, and Depression are especially 
highly loaded on this factor in the psychi- 


ing the number of factors will not be given & 
general defense here, it is of interest to note that 
in this particular instance other criteria yield 
equivalent results. As part of a larger methodo- 
logical study, Linn (1965) analyzed the present 
Air Force data and found the same number of 
factors to be indicated by: (a) inflection in the 
curve of latent roots and (b) the mean square 
ratio between factor loadings based on the origi- 
nal data matrix and factor loadings based on & 
data matrix augmented by randomly generated 
variables. 
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TABLE 17 


DISCRIMINANT ANALYSIS OF Six Inpatient DrAGNOsTIC GROUPS BASED 
on 13 MMPI CLINICAL Scares 
(N = 176 women) 


5 Scaled vectors 
cales 
I IH Ir IV v 

L —.110 —.114 — .082 — .056 — 146 
F —.146 .207 .017 .269 .232 
K .224 .047 —.242 .181 .098 
Hs .008 —.194 016 — .844 107 
D —.091 — .020 254 .316 — .080 
Hy —.031 .320 343 —.087 223 
Pd .188 — .206 — 457 —,192 100 
Mf — .008 — .254 — .040 .010 -080 
Pa —.071 .228 —.190 — .265 — 405 
Pt .893 .419 — .286 — 044 — .278 
Se — 568 — 446 —.104. .022 .280 
Ma —.071 — .008 214 .302 -086 
Si —.021 — .023 .087 -189 — 119 

Latent roots Trace (percent) 

Al = .2425 39.80 

A2 = .1561 25.63 

A3 = .1359 22.31 

M. = .0408 6.70 

A5 = .0339 5.56 


Trace = .0092; A = .5095; Fj) = 1.49, p < 0l. 


atric sample as they are in the Air Force 
sample. 

The maladjustment factor in the Air 
Force sample is highly loaded by categories 
of physical complaint but is also clearly a 
general factor of psychological complaint as 
well. In the psychiatric sample, the factor 
is less general, and hence the emphasis on 
physical complaint is more prominent. In 
the college sample this trend is reversed. 
Here the first factor is one which predomi- 
nately. emphasizes Social Maladjustment, 
Poor Morale, and Depression. Phobias are 
highly loaded on Factor I, but the cate- 
gories of Organic Symptoms and Poor 
Health are loaded on another factor (Factor 
III). One of the factors underlying the re- 
lations among the content scales would thus 
appear to be a maladjustment or complaint 
factor. Its generality and relative emphasis 
on psychological, social and somatic symp- 
toms would seem to vary with the popula- 
tion studied, however. 

Factor II. The second factor in the Air 
Force sample is primarily loaded by Au- 
thority Conflict, Hypomania, and Manifest 


Hostility. More moderate loadings are con- 
tributed by Psychoticism, Depression, Fam- 
ily Problems, and Poor Morale. In the 
psychiatric sample this factor is more prom- 
inent, emerging as the first factor in the 
analysis, and accounting for 45% of the 
common variance. Again, the primary load- 
ings are on Manifest Hostility, Hypomania, 
and Authority Conflict. Somewhat more 
substantial loadings occur on Family Con- 
flict, Psychoticism, Poor Morale, and De- 
pression than was the case in the Air Force 
sample. In the college sample this factor 
appears to be a slightly more general one 
which is distinguished by a substantial 
negative loading on Religious Fundamen- 
talism. In all samples, the second factor 
underlying the relations among content 
scales is one emphasizing a cynical, dis- 
trustful, exploitive attitude toward life, 
hostility toward others, and restless, high- 
strung energy (Table 8). This aggressive 
orientation is accompanied by generally 
low morale and self-esteem and indications 
of coming from a home with a similar orien- 
tation (FAM). In college students this 


24 Jerry S. Wicas 
TABLE 18 
Rorarep Factor MATRICES or CONTENT SCALES IN THREE MALE SAMPLES 
Chanute Air Force Base normals Kankakee psychiatric inpatients Illinois college men 
(N = 261) (N = 258) (N = 100) 

I TH) AI ls I I nm x I nac 6 Fe 
ORG 86 19 —05 79 87 19 12 80 28 35 62 59 
PHO 83^ 7 10 HORE TS O08 724. 30 . 67 63 œ 39 56 
HEA 78 19 06 64 80 14 08 66 22 45 56 57 
DEP 76 47 22 85 66 63 08 84 75 38 26 78 
PSY 74 48 00 77 46 68 33 78 43 59 44 72 
MOR 67 46 32 77 57 67 13 79 78 36 11 76 
FEM 68 —17 —22 41 20 05 75 61 —09 —17 71 54 
SOC 57 03 50 57 59 16 —08 38 83 —06 —11 7 
FAM 53 46 —12 50 26 73 —12 61 44 40 1 36 
AUT 01 85 —05 73 03 83 05 69 30 78 06 69 
HYP 11 82 13 71 19 84 23 80 27 67 20 56 
HOS 41 77 —02 76 27 86 19 85 47 69 04 71 
REL -07 —01 85 73 —05 15 82 69 21 —66 04 48 
Variance (per- 

cent) 53 33 14 37 45 18 40 39 21 


orientation seems to include an element of 
atheism or, at least, deviation from funda- 
mentalist religious convictions. 

Factor III. In the Air Force sample the 
third factor is defined uniquely by Religious 
Fundamentalism with a secondary loading 
on Social Maladjustment. In the psychi- 
atric sample, Religious Fundamentalism 
and Feminine Interests define the factor. 
Feminine Interests define the third factor 
in college males but Religious Fundamen- 
talism is noticeably not involved, and Or- 
ganic Symptoms and Health Concern 
emerge as scales not involved in the third 
factor for the Air Force or psychiatric sam- 
ples. The three samples rather clearly differ 
with respect to the manner in which Re- 
ligious Fundamentalism and Feminine In- 
terests enter into the underlying factor 
structure. In the Air Force and college sam- 
ples, Feminine Interests load a factor char- 
acterized by both somatic and psychological 
complaint. In the psychiatric sample, how- 
ever, Feminine Interests load a factor 
primarily characterized by Religious Fun- 
lamentalism. Whereas Religious Funda- 
nentalism defines a relatively distinct fac- 
or in the Air Force and psychiatrie 
samples (ie. Factor III), this category is 
associated with the hostility factor (Factor 
(I) in the college sample. 


Interpretation of Factorial Dimensions Un- 
derlying Content Scales 


The preceding analyses suggested that 
the number of dimensions underlying the 
MMPI content scales is the same for quite 
different populations, but that the specific 
structuring of these dimensions varies with 
the population studied. Although the mean- 
ing of the content scales is, to some extent, 
self-explanatory, little attempt was made to 
make substantive interpretations of the 
three factors in the different populations. 
Such interpretations will require, as a mini- 
mum, the employment of marker scales 
which will coordinate the present findings 
with the extensive empirical literature of 
the factorial structure of the MMPI. In 
addition to the statistical identification of 
factors, it will also be necessary to relate the 
apparent substantive nature of the factors 
identified to what is known about their 
extratest correlates. Such an analysis will 
first require a brief consideration of the con- 
siderable literature that exists on the fac- 
torial structure of the MMPI. 

With the exception of certain factorially 
derived inventories such as Cattell’s Six- 
teen Personality Factor Questionnaire 
(Cattell & Stice, 1962), the MMPI has been 
subjected to more factor analytic investiga- 
tions than any other test in widespread use. 
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Although subject to considerable interpre- 
tative controversy, a rather remarkable 
agreement exists as to the dimensionality 
of this instrument. When the intercorrela- 
tions of MMPI clinical scales are factored, 
two substantial factors emerge which ac- 
count for the vast majority of common vari- 
ance. These factors are consistently marked 
by Welsh’s (1956) A and R, respectively, 
which were developed for this purpose. De- 
pending on the investigator’s tolerance for 
percentage of variance extracted, several 
additional smaller factors have been iden- 
tified which appear more subject to varia- 
tion, as a function of scales included and 
populations studied, than do the first two 
factors. 

Early factorial studies of the MMPI 
tended to label the first factor “personality 
maladjustment” (Cook & Wherry, 1950), 
“psychotic maladjustment” (Cottle, 1950; 
Wheeler, Little, & Lehner, 1951), and “anx- 
iety” (Hichman, 1962; Welsh, 1956). More 
recent studies tend to interpret both poles 
of the first factor and to relate it to a 
broader theoretical context, such as “anxi- 
ety vs. dynamic integration” (Karson & 
Pool, 1957, 1958), “ego-weakness vs. ego- 
strength” (Kassebaum, Couch, & Slater, 
1959), and “general complaint vs. dynamic 
integration” (Gocka & Marks, 1961). In a 
similar fashion, interpretations of the sec- 
ond factor of the MMPI have changed from 

“overactivity and recklessness” (Cook & 
Wherry, 1950), “neurotic adjustment” 
(Cottle, 1950; Wheeler et al., 1951) and 
“repression” (Eichman, 1962; Welsh, 1956) 
to the broader category of “extraversion 
vs. introversion” (Gocka & Marks, 1961; 
ae? & Pool, 1957, 1958; Kassebaum et 

, 1959). Additional factors, beyond the 
frst two, have been variously labeled as 
"paranoia" (Cook & Wherry, 1950; 
Wheeler et al., 1951), “feminine interests” 
(Cook & Wherry, 1950; Cottle, 1950), 
“somatization” and “qneoventionality” 
(Eichman, 1962), and “tender-minded sen- 
sitivity” (Gocka & Marks, 1961; Kasse- 
baum et al., 1959). 

During the last decade, the foregoing sub- 
stantive interpretations of MMPI factors 
have been seriously challenged by an argu- 


mentative and prolific group of writers de- 
voted to the demonstration of response 
styles and sets in the MMPI which are 
alleged to vitiate or, at best, severely limit 
the credibility of such substantive inter- 
pretations (see Rorer, 1965). Edwards and 
his colleagues have steadfastly maintained 
that the first factor of the MMPI is best 
thought of as reflecting “social desirability” 
(Edwards, 1957, 1961, 1962; Edwards & 
Diers, 1962; Edwards, Diers & Walker, 
1962; Edwards & Heathers, 1962; Edwards 
& Walker, 1961; Edwards & Walsh, 1963). 
Others have maintained that such stylistic 
tendencies as “acquiescence” (Messick & 
Jackson, 1961) or the tendency to answer 
“deviantly true" (Barnes, 1956a, 1956b; 
Wiggins, 1962) are involved in the first 
factor as well. The second factor of the 
MMPI has been interpreted as reflecting 
“acquiescence,” most notably by Jackson 
and Messick (1958, 1961, 1962) . A third styl- 
istic factor was first identified by Edwards 
et al. (1962) and subsequently replicated by 
others (Edwards & Walsh, 1964; Liberty, 
Lunneborg, & Atkinson, 1964; Wiggins, 
1964; Wiggins & Lovell, 1965). This factor 
has been referred to as a "lying" factor 
(Edwards et al., 1962; Liberty et al., 1964) 
and as a "social desirability role-playing" 
factor (Wiggins, 1964). Although this third 
factor appears to be somewhat of a “pure” 
response style factor which is not highly 
related to other sources of variance in the 
test, it is highly loaded by special scales 
which are themselves correlated with the 
tendency to modify answers to the test in 
a socially desirable direction under instruc- 
tions to do so (Cofer, Chance & Judson, 
1949; Boe & Kogan, 1964; Hunt, 1962; 
Skrzypek & Wiggins, 1966; Walker, 1962; 
Wiggins, 1959). 

"The practical relevance or even existence 
of response styles in the MMPI has re- 
cently been called into question (Block, 
1965; MeGee, 1962; Rorer & Goldberg, 
1965). The most effective defense of sub- 
stantive interpretations of MMPI factors 
has been made by Block (1965), who not 
only challenged stylistic interpretations on 
logical and statistical grounds, but provided 
empirical data which were, to him, demand- 
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ing of substantive interpretation. To dem- 
onstrate that stylistic interpretations of the 
first two factors of the MMPI are not suffi- 
cient, Block developed what he considered 
to be a desirability-free measure of the first 
factor and an acquiescence-free measure of 
the second factor. To the extent that these 
two scales mark the factors involved, one 
must concede that something “other” than 
response styles are involved. More decisive, 
however, was Block’s appeal to the long- 
overdue criterion of external evidence of 
substantive dimensions being measured by 
the first two factors. By the method of con- 
trasted groups, Block obtained Q-sort de- 
scriptions by professional psychologists of 
high- and low-scoring subjects on the first 
two factors of the MMPI. These descrip- 
tions were obtained independently of the 
MMPI in five diverse samples of subjects 
under a variety of assessment circum- 
stances. The constellation of Q-sort adjec- 
tives characterizing high- and low-scoring 
subjects on these factors led Block to con- 
clude that the first factor of the MMPI 
measures “ego-resiliency,” while the second 
factor measures “ego-control.” Space limita- 
tions prohibit documentation of the full 
range of closely reasoned arguments which 
led Block to the foreging conclusion. For 
present purposes, it will simply be noted 
that Block has convincingly demonstrated 
the necessity for substantive interpretations 
of these factors, regardless of their degree of 
contamination with other sources of vari- 
ance, 

For reasons dictated by the availability 
of original protocols, a substantive inter- 
pretation of the factors underlying the 
MMPI content scales will here only be at- 
tempted in college populations. An attempt 
was made to align the three factors found in 
the small sample of Illinois college men 
(Table 18) with reference to the principal 
stylistic and substantive dimensions sug- 
gested by the recent factor analytic litera- 
ture of MMPI clinical scales. This was ac- 
complished in the considerably larger 
samples of Stanford men and women under- 
graduates. 

Factor analysis of the intercorrelations 
of MMPI content scales in the sample of 
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Illinois undergraduate men revealed three 
factors (Table 18). The first factor was 
loaded principally by Social Maladjust- 
ment, Poor Morale, and Depression. The 
second factor was loaded by Authority Con- 
flict, Manifest Hostility, and Hypomania 
and negatively by Religious Fundamental- 
ism. The third factor was loaded by Femi- 
nine Interests and by the Organie Symp- 
toms and Poor Health scales. 

Additional factor analyses were per- 
formed on samples of 250 men and 250 
women from Stanford University. In addi- 
tion to the 13 MMPI content scales, six 
marker variables were included to define the 
traditional MMPI clinical scale space. Four 
of these markers are subject to stylistic 
interpretation, while the remaining two are 
not. Welsh’s (1956) factor Scales A and R 
were included as markers of the first two 
factors of conventional MMPI space. Wig- 
gins’ Sd (Wiggins, 1959) and Cofer et al.’s 
(1949) Cof were included to investigate the 
possible convergence of the third content 
factor with the third stylistic dimension 
previously mentioned (Edwards et al, 
1962). Block's (1965) ER-S and EC-5 were 
included as desirability-free and acquies- 
cence-free measures of the factors he de- 
scribes as “ego-resiliency” and “ego-con- 
trol,” respectively. As before, the inter- 
correlation matrices were factored by 
the method of principal components, and 
factors with latent roots greater than unity 
were rotated analytically to a varimax cri- 
terion. The rotated factor matrices for 
samples of Stanford men and women are 
presented in Table 19. Factor loadings less 
than .33 have been omitted, and the 
matrices have been arranged in such a way 
as to facilitate comparison. 

Whereas three factors were obtained from 
the analysis of content scales in the sample 
of Illinois men, five factors are seen to 
emerge when six marker scales are in- 
cluded. These five factors account for 69% 
and 72% of the total variance in the female 
and male samples, respectively. The first 
three factors are recognizable as the same 
obtained in the earlier analysis. The fourth 
factor is a “stylistic” factor determined by 
the inclusion of Sd and Cof. These stylistic 
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TABLE 19 
Roratep Factor Matrices or Content Scares PLus Six MARKER VARIABLES 
pr uns SO eR 

I II IH Iv v h I IH Ir Iv v p 
MOR 81 80 79 36 77 
A 80 37 88 85 40 91 
SOC 79 —36 81 62  —56 81 
DEP 78 38 81 78 42 82 
ER-S —57. —42 —87 67 —69 —34 63 
PHO 53 34 49 47 58 57 
HOS 40 72 69 68 37 69 
Cof —40 75 79 —44 7" 78 
FAM 97 34 33 40 40 47 
PSY 33 37 56 59 53 61 69 
R —83 74 —38. —72 78 
HYP 70 66 52 58 71 
EC-5 —70 40 80 —80 83 
AUT 55 49 48 33 34 —35 59 
ORG 82 75 34 77 71 
HEA 78 74 76 69 
Sd 88 78 83 7 
REL 72 58 75 62 
FEM 85 74 87 80 
Variance 

(percent) 50 18 15 9 8 36 17 22 16 9 


scales have little in common with content 
scales other than Religious Fundamental- 
ism, although this, in itself, is an intriguing 
finding. The present space is such that 
Feminine Interests emerges as a fifth quite 
specific factor, distinct from Factor II. 
Factor I. Anxiety proneness versus ego 
resiliency. The first factor in both samples 
is clearly and unambiguously marked by 
Welsh’s A, which coordinates the first fac- 
tor of content scales with the first factor 
obtained in all studies of clinical scales to 
date. Although Scale A provides a statisti- 
cal identification of the factor, it does not 
allow a choice between stylistic and sub- 
stantive interpretations. Block's ER-S 
which does not admit of stylistic interpreta- 
tion has a substantial, but not unique, load- 
ing on this factor in the present analyses. It 
will not be argued that the present factor is 
free of the contaminating influence of social 
desirability. However the nature of the 
content scales which load this factor is con- 
sidered of more than passing interest. Poor 
Morale, Social Maladjustment, and Depres- 
sion have high loadings on the first factor 
in both groups. The item content of these 


seales (Table 8) suggests an individual 
lacking in self-confidence, who is socially 
inhibited and given to feelings of guilt and 
apprehensiveness. An individual at the 
other end of the implied continuum, would 
be characterized by self-confidence and 
optimism; social ascendance and poise; and 
a confident, resilient approach to the future. 
The item content of the scales which mark 
this factor are so close to the independent 
behavior descriptions obtained by Block 
(1965) for individuals with high and low 
scores on the same psychometric dimension 
that Block’s suggested label of “ego-resili- 
ency” is here applied to the first factorial 
dimension of MMPI content scales. 

Factor II. Impulsivity versus control. 
The second factor is marked by Welsh’s 
R and Block’s EC-5, with E predominating 
in the sample of women and EC-5 marking 
the factor in the sample of men. Although 
R is highly subject to stylistic interpreta- 
tion, EC-5 is not. In light of the recent crit- 
icisms directed at the interpretation of this 
dimension as “acquiescence” (Block, 1965; 
MeGee, 1962; Rorer, 1965; Rorer & Gold- 
berg, 1965), the burden of proof of the 
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utility of such an interpretation is shifted 
to its proponents. More germane to the 
present analyses are the nature of the con- 
tent scales which load this factor. Hypo- 
mania has high loadings on the second 
factor in both samples. The items which 
constitute this scale emphasize exitement, 
restlessness, hotheadedness and overcom- 
mitment. Such items are suggestive of im- 
pulsivity or lack of control at one pole of 
the dimension and control, or possibly, 
“overcontrol” at the other. Manifest Hostil- 
ity and Authority Conflict have high load- 
ings on this factor in the female sample. 
The items in these scales reflect the free 
expression of aggressiveness and the cyni- 
cal, distrustful attitude that everyone 
should get away with what he can. Such 
items are seen as consistent with the lack 
of impulse control suggested by the Hypo- 
mania scale. In the male sample, Manifest 
Hostility and Authority Conflict have 
smaller loadings while Social Maladjust- 
ment and Family Problems contribute more. 
Again, the constellation of content scales 
marking the second factor is highly simi- 
lar to the independent behavior descriptions 
obtained by Block (1965) for this dimen- 
sion which led him to label the factor as a 
“control” dimension. It is also of interest 
to note that Block (1965) has argued that 
the control dimension is expressed differ- 
ently in men and women, which also ap- 
pears to be the case in the present analyses. 

Factor III. Health concern. In both 
samples, the third factor is characterized by 
high loadings on Organic Symptoms and 
Poor Health. The relationship between these 
two scales is rather obvious and would seem 
to warrant the general label of “health con- 
cern” for the factor. In college populations, 
at least, there appears to be an underlying 
factor of concern with health that includes 
both the headaches, dizziness, etc. from the 
Organic Symptoms content scale and the 
gastrointestinal complaints from the Poor 
Health content scale. Lacking the independ- 
ent behavior descriptions, which were avail- 
able for the first two factors, and lacking 
a factor marker which would relate the 
present dimension to previous ones, it seems 
best to view this factor as one involving 


“reported poor health.” Considering the 
large number of items in the MMPI pool 
which relate to health, it is not surprising 
that such a factor would emerge. A “Poor 
Physical Health” factor has emerged from 
factor analysis of items from several of the 
standard MMPI clinical scales (Comrey, 
1957a, 1957b, 1957c, 1958c; Comrey & 
Marggraff, 1958). Factor analysis of groups 
of MMPI scales has also yielded a “somati- 
zation” factor (Hichman, 1961; Fisher, 
1964). 

Factor IV. Social desirability role play- 
ing. This factor was rather clearly deter- 
mined by the inclusion of the stylistic role- 
playing scales which define it: Wiggins’ 
(1959) Sd and Cofer et al.’s (1949) Cof. As 
previously indicated, a considerable number 
of studies attest to the behavioral correlates 
of these scales; namely the tendency to 
modify answers to the MMPI in a socially 
desirable direction when instructed to do so. 
The content scale of Religious Fundamen- 
talism has a high and unique loading on this 
factor in both samples. The most conserva- 
tive interpretation of this finding would be 
that the items in the Religious Fundamen- 
talism scale are those which are most sub- 
ject to change under conditions which en- 
courage faking. However, in view of the 
fact that the Marlowe-Crowne Social De- 
sirability Scale (Crowne & Marlowe, 1960) 
is known to load this factor (Edwards et al., 
1962; Edwards & Walsh, 1964; Liberty et 
al., 1964), a further inference seems justi- 
fied. On the basis of the extensive documen- 
tation of the correlates of the Marlowe- 
Crowne scale (Crowne & Marlowe, 1964) 
which is known to load this factor, it seems 
likely that, in these college samples, in- 
dividuals who describe themselves as reli- 
gious, churehgoing people may be operating 
under a strong motive to gain social ap- 
proval (Crowne & Marlowe, 1964). Such a 
phenomenon may be quite specific to these 
particular samples, however. 

Factor V. Feminine interests. When the 
present set of marker scales are included in 
the factor analysis of MMPI content scales, 
Feminine Interests emerges as a specific 
factor. In the earlier factor analyses of 
content scales in several male samples 
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(Table 18), the Feminine Interests scale 
was seen to vary from sample to sample 
in its factorial contribution. Such a factor 
is reminiscent of the "feminine interests" 
factor which has been reported from time 
to time in the literature (Cook & Wherry, 
1950; Cottle, 1950; Kassebaum et al., 1959; 
Wheeler et al, 1951) and which has 
exhibited considerable fluctuation from 
sample to sample. Interpretation of this 
factor must be restricted to the content of 
the items in the Feminine Interests scale 
(Table 8) since its non-MMPI correlates 
have not been investigated. 

In summary, when the factorial dimen- 
sions of the MMPI content scales were 
aligned with previously reported dimensions 
of MMPI clinical scales, considerable con- 
vergence was evident. In a college popula- 
tion, the first two factors were clearly 
marked by Welsh’s (1956) A and R which 
permitted their identification as the first 
two factors of previously reported studies. 
Although the possible contaminating effect 
of “social desirability” could not be ruled 
out, the first factor was interpreted as 
reflecting “anxiety-proneness versus ego 
resiliency.” The second factor was inter- 
preted as “impulsivity versus control” with 
less concern for the possible alternative 
interpretation of “acquiescence.” The third 
factor appeared to reflect “health concern" 
as judged from the item content of the 
scales which loaded it. A relatively specific 
factor of “feminine interests” was identi- 
fied, although its generality across popula- 
tions was questioned. The possibility that 
high scores on the Religious Fundamental- 
ism content scale may be associated with 
high approval motivation (Crowne & Mar- 
lowe, 1964) was also raised. 


IMPLICATIONS FOR CLINICAL INTER- 
PRETATION OF THE MMPI 


In several of the analyses previously 
presented, the MMPI clinical scales were 
used as a baseline or frame of reference for 
comparison with the content scales. These 
comparisons involved such issues as inter- 
nal consistency, group differences, and 
differential diagnosis, Although the clinical 
and content scales were found to be similar 
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Fic. 1. Profile of hypothetical patient on stand- 
ard MMPI scales. 


in many respects, they should not be viewed 
as equivalent from the standpoint of clini- 
eal application. This point is best illustrated 
by use of an artificially constructed profile 
of MMPI clinical scale scores. Such a pro- 
file is presented in Figure 1. 

The profile in Figure 1 bears a resem- 
blance to that of “John Doe" presented 
by Shneidman (1951, p. 221). Shneidman's 
patient was a 25-year old single male whose 
primary diagnosis was that of anxiety reac- 
tion but for whom there were indications 
(especially in the MMPI) of an incipient 
schizophrenic reaction. The hypothetical 
profile in Figure 1 differs from John Doe’s 
principally in Pt which is lower and in Mf 
and Sc (also lower) plus a slightly higher 
F. Configurally, the hypothetical profile 
resembles that of a 21-year old single male 
reported in the MMPI Atlas (Hathaway 
& Meehl, 1951, p. 605). The diagnosis for 
this patient was reactive depression with 
a lingering doubt concerning organic pathol- 
ogy. The overall elevation of the profile 
reported in the Atlas was much less than 
that of the hypothetical profile. The hypo- 
thetical profile is not easily classified within 
Marks and Seeman’s (1963) system al- 
though this is, itself, no indictment of its 
typicalness (Huff, 1965). All of the promis- 
ing diagnostic signs employed by Goldberg 


TABLE 20 
Raw Scores on Content Scares rog Two 
HYPOTHETICAL PATIENTS WITH IDENTICAL 
CLINICAL SCALE PROFILES 


Content scale Patient A Patient B Difference 
ORG 28 5 +23 
PSY 8 21 —13 
HEA 7 20 —13 
FEM 6 17 EXIT 
FAM 11 6 +5 
DEP 15 19 —4 
HYP 8 12 —4 
PHO 13 9 +4 
SOC 10 13 —3 
HOS 9 12 —-3 
AUT 18 11 +2 
REL 7 9 —2 
MOR 12 11 +1 


(1965) classify this profile as “psychotic.” 
Perhaps the best appeal is to “clinical ex- 
perience,” which for many will verify the 
plausibility of encountering a profile such 
as that in Figure 1 in a hospital setting. 

The profile in Figure 1 may be used to 
illustrate the manner in which content 
scales may supplement interpretation of 
standard MMPI profiles. Starting with two 
identical 566-item protocols, each of which 
yielded the clinical scale profile in Figure 
1, content scale scores were varied under 
the restriction that clinical scale scores 
remain the same. This was done to illustrate 
the point that the same profile of clinical 
scale scores can be obtained in two proto- 
cols which differ markedly from each other 
in their content scale scores. Table 20 pre- 
sents the raw content scale scores for two 
hypothetical patients (Patient A and Pa- 
tient B) each of whom produces the identi- 
cal MMPI clinical profile illustrated in 
Figure 1. 

Patient A has admitted to a large number 
of symptoms thought to be indicative of 
organie pathology. Additionally, he admits 
having family problems and a number of 
fears. In contrast, Patient B admits to a 
large number of psychotic symptoms of a 
primarily paranoid nature. He is greatly 
concerned about his health and admits to 
liking an unusual number of feminine pur- 
suits. By comparison with Patient A, Pa- 
tient B is generally more deviant with re- 
spect to content categories reflecting poor 
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morale, mood instability, social maladjust- 
ment, and hostility. 

The configuration of content scale scores 
of Patient B readily confirms the impres- 
sion of psychopathology gained from an 
inspection of the clinical profile in Figure 
1. This could be the profile of a paranoid 
schizophrenic with an underlying homo- 
sexual component and a body concern that 
is delusional in nature. Poor morale, social 
maladjustment, and hostility are, of course, 
compatible with this picture. 

Although Patient A’s raw content scale 
scores are sufficiently deviant to be con- 
sidered those of a hospitalized patient, they 
are in sharp contrast to those of Patient B. 
By comparison, Patient A is almost ex- 
clusively concerned with organic symptoms 
and, to a lesser extent, family problems. 
Evidence of delusional thinking, health con- 
cern, feminine interests, and general mal- 
adjustment is comparatively weak for Pa- 
tient A. The clinical scale profile in Figure 
1 may now be viewed in a quite different 
light. 

The present example does not imply that 
the long-awaited method of differentiating 
schizophrenic from brain disorders has been 
discovered to reside within the MMPI. It 
is meant to imply that a given clinical scale 
profile may be viewed in quite different 
perspectives as a function of variation in 
the underlying content components which 
determine profile elevation. The interpreta- 
tive significance of content scale configura- 
tions cannot be taken at face value, and a 
great deal more research and experience 
with these scales must precede any recom- 
mendations for clinical application. Curios- 
ity concerning the nature of patients’ com- 
munications to us would seem to be a 
healthy interest, however, and some may 
prefer to adopt the MMPI content scales as 
the most promising procedure for satisfy- 
ing this interest. It is hoped that such in- 
terim applications would be strictly supple- 
mental to the tried and, occasionally, true 
procedures for clinical scale interpretation. 


Discussion 


To encourge further investigation of the 
empirical properties of the content scales 
is to imply that they possess advantages 
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over the currently employed clinical scales. 
Since such a position is taken by the present 
investigator, it seems appropriate to re- 
view these claimed advantages and to dis- 
cuss their relevance for both clinical and 
research application of the MMPI. 

Viewed from the convenient hindsight of 
25 years, the MMPI appears to have been 
poorly conceived for the purposes it was 
eventually to serve. The Kraepelinian cate- 
gories to which it was committed were soon 
to pass into disfavor. Moreover, the pre- 
dictive success of the individual scales in 
making such psychiatric categorizations was 
considerably less than had been anticipated. 
Under the impetus of an unprecedented 
amount of research, there was a shift of 
emphasis from the psychiatric to the “per- 
sonological” implications of the clinical 
scales and the application of the scales was 
extended far beyond the original context 
of personnel decisions, 

The MMPI clinical scales are poorly 
equipped to serve as personality trait scales 
for several reasons. Several of the scales 
lack the internal consistency which is usu- 
ally taken as evidence of an organized pat- 
tern of behavior. Also, an interpretative 
ambiguity exists with respect to the mean- 
ing and significance of low scores on the 
scales since “normal” subjects rarely 
achieve a score of zero (Wiggins, 1962, pp. 
226-227). Indeed, the hodgepodge of con- 
tent which contributes to a high score on 
a given clinical scale is not suggestive of 
any consistent personality trait or struc- 
ture. The fact that this makes the in- 
ventory difficult to fake would seem, at 
best, a mixed blessing. Given the substan- 
tive heterogeneity of the clinical scales, a 
configural “pattern” may be achieved in a 
wide variety of ways, and it seems cavalier 
to apply standard “blind” interpretations 
to such patterns, as is done in clinical prac- 
tice. Finally, a minor but irritating char- 
acteristic of the scales is the extensive 
degree of item overlap which exists among 
them (Adams & Horn, 1965; Shure & 
Rogers, 1965). 

It seems likely that the MMPI item 
pool, which was once considered so rich and 
untapped, may be too limited as a source of 
items for building general-purpose per- 


sonality scales (Wiggins & Goldberg, 1965). 
This may be true with respect both to con- 
tent and item characteristics, and is cer- 
tainly true of the extent to which the two 
are confounded. Nevertheless, in the ab- 
sence of any immediate replacement, it 
would seem unwise to abandon an inventory 
that has the empirical virtues, however 
limited, of the MMPI. Rather, it would 
seem appropriate to explore the utility of 
supplemental measures which are not en- 
cumbered by all the substantive and psy- 
chometrie shortcomings of the clinical 
scales. 

The MMPI content scales possess a re- 
spectable degree of internal consistency. 
This internal consistency must, in part, be 
attributed to homogeneous organizations of 
psychological, physical, and social com- 
plaints which seem appropriately combined 
by a cumulative scoring model (Loevinger, 
1957, pp. 664-666). Although no claim is 
made for scale unidimensionality or Gutt- 
man-type item properties, each scale has a 
compelling, though prosaic, feature. Sub- 
jects who achieve high scores on the scales 
do so by admitting to, or claiming, an un- 
usual amount of the substantive dimension 
involved. Subjects who achieve low scores 
claim a small amount and, by so doing, may 
or may not be similar to certain abnormal 
groups. But subjects who say they are 
hostile are saying just that and not that 
they have organic symptoms or strong reli- 
gious convictions. A return to this type of 
Woodworthian simplicity has been long 
overdue. 

The present study was able to provide 
only very limited evidence bearing on the 
effectiveness of the content scales in dis- 
criminating among traditional psychiatric 
groups. However, the preliminary evidence 
obtained was not discouraging in this re- 
spect. Although the burden of proof is 
clearly on the content scales, the superiority 
of scales derived by a contrasted groups 
strategy need not be conceded a priori when 
populations other than the derivation 
samples are involved (Hase & Goldberg, 
1965). 

Although apparently heterogeneous in 
content, covariation among content scales 
may be reduced to three underlying fac- 
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tors. The first two of these factors were 
found to be colinear with the first two fac- 
tors consistently found in analyses of the 
MMPI clinical scales. This result is not 
surprising within the domain of MMPI 
items and may even reflect an upper limit 
on the number of parsimoniously inter- 
pretable factors within the conventionally 
defined questionnaire realm (Peterson, 
1965). However, the content scales tend to 
clarify the specific manner in which the 
ubiquitous two factors of personality ques- 
tionnaires manifest themselves within the 
MMPI item pool. The item content of the 
scales which mark these two factors lends 
itself readily to the substantive interpreta- 
tions placed upon these dimensions by 
Block (1965). This is especially important 
when it is recognized that Block's inter- 
pretations were buttressed by independently 
obtained empirical evidence. 

Coming from the same item pool, the 
content scales are no less free than the 
clinical scales from confounding item char- 
acteristics that lend themselves to stylistic 
interpretations. However, the tenor of re- 
cent critical thinking on this issue is such 
as to suggest that the burden of proof of 
the utility of stylistic interpretations has 
been shifted to the proponents of such 
styles. In any event, it seems clearer what 
is being confounded by item characteristics 
in the case of the content scales. Future 


studies of item characteristics would do well 
to examine their effects on substantive 
dimensions rather than on the poorly under- 
stood dimensions yielded by the scale con- 
struction strategy of contrasted groups, 
Such research would naturally be facilitated 
by scales composed of nonoverlapping 
items. 

The case for further investigation of sub- 
stantive aspects of the MMPI may best 
be presented by calling attention to a basic 
feature of assessment situations which has 
tended to be ignored or belittled by sophistie 
arguments. Regardless of psychologists’ 
views of a test response, the respondent 
tends to view the testing situation as an 
opportunity for communication between 
himself and the tester or institution he rep- 
resents (Leary, 1957). Obviously, the re- 
spondent has some control over what he 
chooses to communicate, and there are a 
variety of other factors which may enter to 
“distort” the message, many of them attrib- 
utable to the testing media itself (Cattell, 
1961; LaForge, 1963). Nevertheless, recog- 
nition of such sources of “noise” in the 
system should not lead us to overlook the 
fact that a message is still involved. The 
MMPI content scales may be closely 
attuned to this message and as such may 
provide a useful supplement to the stand- 
ard clinical scales. 
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Irem Listes ror MMPI Content SCALES 


SOC Social maladjustment (27 items) 


True: 52,171, 172, 180, 201, 267, 292, 304, 377, 384, 453, 455, 509 
False: 57, 91, 99, 309, 371, 391, 449, 450, 479, 482, 502, 520, 521, 547 


DEP Depression (33 items) 


True: 41, 61, 67, 76, 94, 104, 106, 158, 202, 209, 210, 217, 259, 305, 337, 338, 339, 374, 390, 396, 413, 


414, 487, 517, 518, 526, 543 
False: 8, 79, 88, 207, 379, 407 
Feminine interests (30 items) 


True: 70, 74, 77, 78, 87, 92, 126, 132, 140, 149, 203, 261, 295, 463, 538, 554, 557, 562 
False: 1, 81, 219, 221, 223, 283, 300, 423, 434, 537, 552, 563 


Poor morale (23 items) 


True: 84,80, 138, 142, 244, 321, 357, 361, 375, 382, 389, 395, 397, 398, 411, 416, 418, 431, 531, 549, 555 


False: 122, 264 
Religious fundamentalism (12 items) 


True: 58,95, 98, 115, 206, 249, 258, 373, 483, 488, 490 


False: 491 


AUT 


PSY 


ORG 


FAM 


HOS 


PHO 


HYP 


HEA 
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Authority confliet (20 items) 


True: 59,71, 93, 116, 117, 118, 124, 250, 265, 277, 280, 298, 313, 316, 319, 406, 436, 437, 446 


False: 294 
Psychoticism (48 items) 


True: 16, 22, 24, 27, 33, 35, 40, 48, 50, 66, 73, 110, 121, 123, 127, 136, 151, 168, 184, 194, 197, 200, 
232, 275, 278, 284, 291, 293, 299, 312, 317, 334, 34, 345, 348, 349, 350, 364, 400, 420, 683, 448, 


476, 511, 551 
False: 198, 347, 464 
Organic symptoms (36 items) 
True: 23,44, 108, 114, 156, 159, 161, 186, 189, 251, 273, 332, 335, 541, 560 


False: 46, 68, 103, 119, 154, 174, 178, 178, 185, 187, 188, 190, 192, 243, 274, 281, 330, 405, 4! 


540 
Family problems (16 items) 
True: 21, 212, 216, 224, 226, 239, 245, 325, 327, 421, 516 
False: 65, 96, 137, 220, 527 
Manifest hostility (27 items) 


True: 28, 39, 80, 89, 109, 129, 139, 145, 162, 218, 269, 282, 336, 355, 363, 368, 393, 410, 417, £26, 438 


447, 452, 468, 469, 495, 536 
Phobias (Qu items) 
True: 166, 182, 351, 352, 360, 365, 385, 388, 392, 473, 480, 492, 494, 499, 525, 553 
False: 128, 131, 169, 176, 287, 353, 367, 401, 412, 522, 539 
Hypomania (25 items) 


T 
i 
i 
96, 508, 


, 


True: 13, 134, 146, 181, 196, 228, 234, 238, 248, 200, 268, 272, 296, 340, 342, 372, 381, 386, 409, 439, 


445, 465, 500, 505, 506 
Poor health (28 items) 
True: 10, 14, 29, 34, 72, 125, 279, 424, 519, 544 
False: 2, 18, 36, 51, 55, 63, 130, 153, 155, 163, 193, 214, 230, 462, 474, 486, 533, 542 


APPENDIX B 
TABLE B1 
SocraL MALADJUSTMENT 
Xx o Group N 
16.71 (5.97) Personality pattern disturbance (OP) 17 
13.48 (8.50) Brain disorders 23 
12.80 (8.33) Psychoneurotie disorders (OP) 15 
11.92 (7.12) Personality trait disturbance (OP) 36 
9.83 ' (7.65) Special symptom reaction (OP) 6 
9.83 (5.41) Psychoneurotic disorders 13 
9.75 (5.42) Transient situational disturbance (OP) 8 
9.57 (5.42) Air Force normals 201 
9.55 (4.80) Schizophrenic psychoses 85 
9.33 (5.93) Sociopathic personality disturbance 46 
8.35 (5.87) Affective psychoses — 20 
8.21 (6.75) Sociopathic personality disturbance (OP) 19 
8.16 (5.50) College normals 96 
8.12 (4.87) Personality trait disturbance 17 
6.33 (5.14) Personality pattern disturbance 15 
5.75 (4.04) Brain disorders (OP) 16 
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TABLE B2 

DEPRESSION 
x a Group N 
16.12 (4.94) Personality pattern disturbance (OP) 17 
14.17 (9.56) Personality trait disturbance (OP) 36 
13.65 (8.92) Brain disorders 23 
12.58 (8.01) Psychoneurotic disorders 15 
12.33 (7.28) Special symptom reaction (OP) 6 
11.16 (6.99) Sociopathic personality disturbance (OP) 19 
10.73 (9.02) Psychoneurotic disorders (OP) 15 
10.11 (7.01) Sociopathie personality disturbance 46 
9.48 (6.39) Air Force normals 261 
9.00 (5.78) Personality trait disturbance T 
8.96 (6.50) Schizophrenic psychoses 85 
8.70 (5.36) Affective psychoses 20 
6.20 (5.40) Personality pattern disturbance 15 
5.42 (4.72) College normals 96 
5.31 (5.16) Brain disorders (OP) 16 
4.13 (8.18) Transient situational disturbance (OP) 8 

TABLE B3 

FEMININITY 
v o Group N 
12.39 (4.35) Brain disorders 23 
11.81 (3.83) Schizophrenic psychoses 85 
11.55 (3.65) Affective psychoses 20 
11.05 (5.21) Sociopathie personality disturbance (OP) 19 
11.04 (3.66) Sociopathie personality disturbance 46 
10.80 (3.91) Personality pattern disturbance 15 
9.98 (3.72) Air Force normals 201 
9.77 (4.25) Psychoneurotic disorders 13 
9.67 (2.16) Special symptom reaction (OP) 6 
9.65 (3.06) Personality trait disturbance 17 
9.16 (3.42) College normals 96 
8.33 (3.35) Psychoneurotic disorders (OP) 15 
8.18 (3.41) Personality pattern disturbance (OP) 17 
8.17 (4.49) Personality trait disturbance (OP) 36 
7.94 (2.11) Brain disorders (OP) 16 
7.50 Transient situational disturbance (OP) 8 
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TABLE B4 
Poor MORALE 


Group 


Personality pattern disturbance (OP) 
Personality trait disturbance (OP) 
Brain disorders 

Sociopathic personality disturbance (OP) 
Special symptom reaction (OP) 
Sociopathie personality disturbance 
Psychoneurotie disorders (OP) 
Affective psychoses 

Psychoneurotic disorders 

Air Force normals 

Schizophrenic psychoses 

Personality trait disturbance 

Brain disorders (OP) 

Transient situational disturbance (OP) 
College normals 

Personality pattern disturbance 


TABLE B5 
RELIGION 


Group 


ES OU o Oni o TS 
ERBSSEBSRSPNSSSEZS|" 


Special symptom reaction (OP) 

Air Force normals 

Brain disorders (OP) 

Affective psychoses 

Brain disorders 

Sociopathie personality disturbance 
Schizophrenie psychoses 

Psychoneurotic disorders 

College normals 

Transient situational disturbance (OP) 
Personality trait disturbance (OP) 
Psychoneurotie disorders (OP) 
Personality pattern disturbance 
Personality trait disturbance 
Personality pattern disturbance (OP) 
Sociopathie personality disturbance (OP) 
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TABLE B6 
AUTHORITY CONFLICT 


Group 


um tal cpi Gp] scs: sor Qr ct PE MES 
SugRERES 


eo 
0 


Sociopathic personality disturbance 
Brain disorders 

Sociopathic personality disturbance (OP) 
Personality pattern disturbance (OP) 
Personality trait disturbance (OP) 

Air Force normals 

Affective psychoses 

Psychoneurotic disorders (OP) 
Schizophrenic psychoses 

Psychoneurotic disorders 

Personality trait disturbance 

Brain disorders (OP) 

College normals 

Special symptom reaction (OP) 
Personality pattern disturbance (OP) 
Transient situational disturbance (OP) 


TABLE B7 
PSYCHOTICISM 


Group 


PLAT AL SOE 
8855859589 


Brain disorders 

Personality pattern disturbance (OP) 
Personality trait disturbance (OP) 

Air Force normals 

Sociopathie personality disturbance 
Schizophrenie psychoses 

Affective psychoses 

Sociopathic personality disturbance (OP) 
Psychoneurotie disorders 
Psychoneurotie disorders (OP) 
Personality pattern disturbance 

Special symptom reaction (OP) 

Brain disorders (OP) 

Personality trait disturbance 

College normals 

Transient situational disturbance (OP) 
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TABLE B8 
ORGANIC SYMPTOMS 


Group 


S2245g55555gpg25585c 


020200 Cro O9 Oo N N o 
Nc Oxon 


Personality pattern disturbance (OP) 
Personality trait disturbance (OP) 
Brain disorders 

Psychoneurotic disorders (OP) 

Special symptom reaction (OP) 
Psychoneurotic disorders 

Sociopathic personality disturbance 
Air Force normals 

Schizophrenic psychoses 

Personality trait disturbance 

Affective psychoses 

Sociopathic personality disturbance (OP) 
Personality pattern disturbance 

Brain disorders (OP) 

Transient situational disturbance (OP) 
College normals 


TABLE B9 
FAMILY PROBLEMS 


Group 


SOO IRR IR IR Or OV OT oT OT Ot HO 
SRESSResaseseress | 


Personality trait disturbance (OP) 
Sociopathic personality disturbance (OP) 
Personality pattern disturbance (OP) 
Brain disorders 

Sociopathic personality disorder 
Psychoneurotic disorders 
Schizophrenic psychoses 

Personality pattern disturbance 
Personality trait disturbance 

Air Force normals 

Affective psychoses 

Psychoneurotic disorders (OP) 

Brain disorders (OP) 

College normals 

Special symptom reaction (OP) 
Transient situational disturbance (OP) 
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TABLE B10 
Hostriity 


Group 


Personality pattern disturbance (OP) 
Personality trait disturbance (OP) 

Air Force normals 

Affective psychoses 

Brain disorders 

Sociopathic personality disturbance (OP) 
Sociopathic personality disturbance 
Psychoneurotic disorders (OP) 

College normals 

Schizophrenic psychoses 

Psychoneurotic disorders 

Special symptom reaction (OP) 
Personality trait disturbance 

Transient situational disturbance (OP) 
Personality pattern disturbance 

Brain disorders (OP) 


M 


TABLE B11 
PHOBIAS 


Group 


Ra EOE E a DD Ie 
RaSRSSSSSESSLSSS 


Personality pattern disturbance (OP) 
Special symptom reaction (OP) 

Brain disorders 

Sociopathie personality disturbance 
Psychoneurotie disorders (OP) 
Schizophrenic psychoses 

Personality trait disturbance (OP) 

Air Force normals 

Affective psychoses 

Sociopathic personality disturbance (OP) 
Psychoneurotic disorders 

Personality trait disturbance 

Brain disorders (OP) 

Personality pattern disturbance 
College normals 

Transient situational disturbance (OP) 


SUBSTANTIVE DIMENSIONS or SELF-REPORT 39 


TABLE B12 

HYPOMANIA 
X [4 Group N 
14.94 (4.46) Personality pattern disturbance (OP) 17 
14.00 (4.89) Sociopathic personality disturbance (OP) 19 
13.80 (5.05) Affective Psychoses 20 
13.71 (5.17) Sociopathic personality disturbance 46 
13.61 (6.19) Brain disorders 23 
13.29 (3.88) Air Force normals 261 
12.89 (3.39) Personality trait disturbance (OP) 36 
12.67 (2.50) Special symptom reaction (OP) 6 
12.07 (4.51) Psychoneurotie disorders (OP) 15 
11.85 (5.55) Psychoneurotic disorders 13 
11.74 (3.77) College normals 96 
11.63 (5.67) Schizophrenic psychoses 85 
11.63 (8.34) Brain disorders (OP) 16 
11.41 (4.35) Personality trait disturbance 17 
10.38 (4.63) Transient situational disturbance (OP) 8 
9.87 (4.47) Personality pattern disturbance 15 

TABLE B13 

Poor HEALTH 
= c Group N 
9.06 (4.53) Personality pattern disturbance (OP) 17 
8.93 (5.97) Psychoneurotic disorders (OP) 15 
8.25 (4.64) Personality trait disturbance (OP) 36 
8.22 (5.18) Brain disorders 23 
8.17 (3.71) Special symptom reaction (OP) 6 
7.46 (5.47) Psychoneurotic disorders 13 
6.90 (8.26) Affective psychoses 20 
6.71 (4.10) Personality trait disturbance 17 
6.50 (4.69) Schizophrenic psychoses 85 
6.47 (4.40) Sociopathie personality disturbance 46 
6.40 (4.05) Air Force normals 261 
5.42 (3.92) Sociopathic personality disturbance (OP) 19 
4.67 (3.61) Personality pattern disturbance 15 
4.00 (2.83) Transient situational disturbance (OP) 8 
3.56 (2.68) Brain disorders (OP) 16 
3.17 (2.34) College normals 96 
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A series of 6 experiments investigated the principles required to account 
quantitatively for responses of human Ss to new combinations of cues 
following discrimination learning. In the training phase of each experi- 
ment, Ss learned to make identifying responses (numerical labels) to sets 
of stimuli (pairs of nonsense syllables) under a paired-associate proce- 
dure. After a fixed number of acquisition trials Ss were tested on stimu- 


lus compounds involving new combinations of the training cues. In all 


experiments a substantial proportion of variance 


in the test data was 


accounted for by a model embodying the additive rule and the proba- 
bility matching rule of stimulus sampling theory. In cases when training 
had been conducted under standard discrimination paradigms, responses 
to test compounds were quite well accounted for by this model without 
auxiliary principles, Ss exhibited preferences for low ambiguity cues in 


test compounds only when this 


forced during training. Under a variety of circumstances, 
the stimulus sampling model could 
“relative novelty” principle, stating 


be improved by the 
that, other things being equal, Ss 


preference had been differentially rein- 


predictions from 
addition of a 


the cues that had occurred least fre- 


tend to sample from test compounds 


quently during previous training. 


: I. this study we are concerned with prin- 
ciples governing transfer of response to new 
combinations of cues following discrimina- 
tion training under relatively simple stim- 
ulus conditions, In general, one expects re- 

sponse to new situations to be determined 

jointly by stimulus characteristics of the 
given situation and by the relevant learning 
history. To reduce an overwhelmingly com- 
plex problem to manageable proportions for 
preliminary quantitative analysis, the au- 
thors and their associates have pursued the 

Strategy of limiting consideration to situa- 

tions involving sets of simple, discrete, 
early discriminable cues, while studying 


This research was supported by Research 
Grants MH 02170 and MH 11792 from the Na- 
tional Institute of Mental Health of the National 
Institutes of Health, United States Public Health 
Service, Contract Nonr-908(16) between the Office 
of Naval Research and Indiana University, and 
Contract Nonr-225(73) between the Office of 
Naval Research and Stanford University. Repro- 
duction in whole or in part is permitted for any 
purpose of the United States Government. — 

Tn accord with the previously stated position 
of one of the authors (Estes, 1959a, pp- 455-456), 


e 


1 


transfer as à function of frequencies of rele- 
vant events during previous learning expe- 
riences. 

From a rather extensive series of studies 
conducted within this context, two princi- 
ples of some generality have emerged, The 
first of these is the additive rule, which has 
been taken as a basic response axiom in 
stimulus sampling models for learning (e.g., 
Atkinson & Estes, 1963; Estes, 1959b). If 
each member of a population of cues has 
been uniformly associated with some one 
member of a set of alternative responses 
during training, then according to the addi- 
tive rule the probability of a response on a _ 
test trial is equal to the proportion of the 
cues present on the test which have been 
associated with the given response. For ex- 
ample, in an experiment reported by one of 
the authors (Atkinson & Estes, 1963, p. 


we shall take the viewpoint of the experimenter 
in specifying what is meant by cues. Thus, they 
are those aspects of a stimulus situation that are 
independently manipulated in a given experimen- 
tal context. In the examples that follow, the let- 
ters a, b, c, ete. designate separate cues, 
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193), a response A; was reinforced during 
training in the presence of the set of cues 
abc and response As in the presence of the 
set def; probability of response A, on a test 
trial in the presence of a new sample abd 
was predicted to be % by the additive rule 
(and proved to be .669 in the data). 

The second principle, which may be 
termed the matching rule, has arisen from 
two quite different lines of investigation— 
studies of probability learning (e.g., Estes, 
1957, 1964; Estes & Straughan, 1954) and 
studies of frequency effects in visual recog- 
nition situations (Binder, 1963; Binder & 
Feldman, 1960). According to the matching 
rule, if two or more different responses have 
been reinforced in the presence of a given 
cue during training, whether this cue has 
appeared alone or as a component of more 
complex patterns, then on any later test the 
probability that any one of these responses 
will be evoked by the given cue is equal to 
its relative frequency of reinforcement. To 
illustrate in terms of the Binder and Feld- 
man (1960) study, subjects (Ss) first learned 
discriminations between combinations of 
geometric figures which we shall denote, for 
convenience, by small letters, a, b, c, ete. Fol- 
lowing training in which a combination of 
cues ac always had A, as the reinforced re- 
sponse and the combination ad always had 
A» as the reinforced response, but with the 
firs& combination occurring twice as often 
as the second in each iraining block, Ss 
were tested on cue a alone. Predicted re- 
Sponse frequencies for the group of Ss, ac- 
cording to the matching rule, were 34.7 Aj's 
to 17.3 Asj's; the observed frequencies were 
36 As and 16 Aps (Binder & Feldman, 
1960, pp. 11-12). Numerous studies have 
provided quantitative support for these 
principles when transfer is tested following 
standard diserimination or paired-associate 
training (e.g., Binder, 1963; Binder & Feld- 
man, 1960; Estes, Burke, Atkinson, & 
Frankmann, 1957; Feldman, 1963; Peter- 
son, 1956; Schoeffler, 1954), with some re- 
cent studies adding the qualification that 
cues which covary during training may 
come to act as units, or “elements,” in car- 
rying transfer effects (Estes & Hopkins, 
1961; Friedman, 1966; Friedman & Gel- 
fand, 1964). 


In the series of experiments to be Te- 
ported, we have tried to provide more strin- 
gent tests of these two principles of trans- 
fer, and at the same time to assess various 
auxiliary rules that have been proposed, 
by introducing various modifications into 
standard training paradigms. The various 
auxiliary hypotheses will be spelled out in 
connection with the specific experiments 
designed to test them. To anticipate the 
principal results, we may indicate at the 
outset that the studies will provide further 
support for the matching and additive 
rules, yield only negative evidence with re- 
spect to all of the other hypotheses we had 
in mind at the outset of the investigation, 


and uncover the operation of a “relative ' 


novelty principle” which has been impor- 
tant in other theoretical contexts (e.g., 
Broadbent, 1958) but had not previously 
been suspected, by us at least, to be in- 
volved in the types of transfer situations 
under consideration. 

The experiments follow a common over- 
all plan in which Ss first learn identifying 
responses to sets of stimuli by a paired-as- 
sociate procedure, then after a fixed num- 
ber of acquisition trials, are tested on new 
stimulus compounds comprising two or 
more of the training cues recombined in 
various ways. Thus every test stimulus is E 
compound including components of previ- 
ously learned stimuli. Learning conditions 
and cue relationships are manipulated ex- 
perimentally and relative frequencies of re- 
sponses to test combinations are observed. 
An important aspect of our procedures is 
that Ss are never prepared for the transfer 
tests by any special instructions. Thus re- 
sults of the transfer tests may be taken as 
evidence regarding what was learned during 
discrimination training, without contami- 
nation by higher order problem-solving 
strategies or the like. 


EXPERIMENT I 


In this and the subsequent experiments, 
we shall take as a base-line predictions de- 
rivable from the matching rule and the ad- 
ditive rule, which, within the context of 
stimulus sampling theory, characterize the 
component model (Estes, 1959b). System- 
atic deviations from these predictions may 
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be taken to indicate effects of other factors 
or relationships. This first experiment was 
contrived as a broad spectrum exploration 
of several factors whose effects, if any, are 
confounded in standard discrimination- 
transfer experiments. The stimulus response 
relationships during training are most sim- 
ply portrayed by means of the schema in 
Table 1, the actual cues being represented 
by small letters a through f and the re- 
sponses by the letters i, j, and k. 

The design may be regarded as compris- 
ing two standard discrimination tasks 
which are learned concurrently by Ss. In 
the first of the two discriminations, Ss must 
learn to make response 7 to stimulus combi- 
nation ab and response j to stimulus combi- 
nation ac, the discrimination thus involving 
two stimulus patterns which have an ele- 
ment, a, in common. The second discrimina- 
tion involves learning to make response 7 
to compound de and response k to df, again 
the discrimination involving two patterns 
with a common cue. 

To the extent that the component model 
correctly represents learning in this situa- 
tion, we should expect that at the end of 
training, cues b, c, e, and f would be asso- 
ciated with responses 7, j, 7, and k, respec- 
tively; that cue a would be equally likely 
associated with 7 and with j; and that cue 
d would be equally likely to be associated 
with 4 and with k. If, then, the S were tested 
on a new combination ae, the predicted re- 
sponse probabilities to the compound 
would be .75 for response ? and .25 for re- 
sponse j. The basis for the prediction is as 
follows. On the test with ae, the S is equally 
likely to sample either of the two compo- 
nent cues. If he samples cue e, this necessar- 
ily leads to response i, but if he samples à 
it leads to responses 7 and j with equal 
probabilities; thus the total predicted prob- 
ability is 25 for j and .25 + .50 = .75 for i. 
Predictions for other test combinations 
may be similarly calculated for the compo- 
nent model and will be presented and dis- 
cussed in connection with the results of the 
testing phase. 

We are well aware that the component 
model, taken by itself, cannot possibly 
handle the combined results of the learning 
and testing phases of this experiment. For, 


TABLE 1 
DESIGN or EXPERIMENT I 
‘Training Testing 
syllable 
OE E Responses combinations 
ab i ae 
gs j be 
de i ad 
df k dc 
bc 
of 


if Ss responded solely in terms of associa- 
tions between the separate cues and the cor- 
rect response, they could never reach a cri- 
terion of 100% correct responding during 
discrimination training. Even at the asymp- 
tote of learning, the probability of a correct 
response, to, say, ab would be only .75 ac- 
cording to the same reasoning just sketched 
in connection with test combination ae. It is 
well known that simple discriminations like 
these will readily be mastered to a strict 
criterion of 100% proficiency by human Ss 
(or, for that matter, monkeys, rats, or 
pigeons) and thus that any discrimination- 
transfer theory must involve some addi- 
tional principle. 

In some respects the simplest augmented 
theory is the mixed model, first preposed by 
Estes and Hopkins (1961) and developed 
quantitatively by Atkinson and Estes (1963, 
pp. 243-249). According to the mixed model, 
during discrimination training, associations 
are formed not only between the various in- 
dividual cues, such as a, b, and c, and the 
correct response, but also between pattern 
cues and the correct response. Moreover, 
pattern cues, once learned, dominate the 
lower order component cues. Considering 
only the first row of Table 1, according to 
the mixed model, at the end of training the 
S would have formed associations between 
the pattern cue ab and response 1, between 
component cue a and either response i or 
response j, and between component cue b 
and response i, but would be responding 
solely in terms of the pattern cue and thus 
making 10096 correct responses upon pres- 
entations of the training combination ab. 
When tests on new compounds are given 
after diserimination training, the training 
and testing situations have no pattern cues 
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in common and thus with regard to predic- 
tions about behavior on the transfer tests, 
the mixed model reduces to the component 
model, predictions being generated as illus- 
trated above. 

In another type of augmentation of the 
component model, associated with the dis- 
crimination theories of Restle (1955) and 
Atkinson (1961), among others, it is as- 
sumed that associations between component 
cues and reinforced responses develop dur- 
ing training much as assumed in the com- 
ponent model but that S also learns to 
respond selectively to cues which are re- 
liable predictors of reinforcing events. For 
example, Ss learn to select or attend to cues 
b and c in the first two rows of Table 1, and 
to ignore or “adapt to” common cues, such 
as cue a in the first two rows of Table 1, 
which are not uniformly correlated with 
reinforcement of any one response during 
training. If this latter type of theory is 
correct in essentials, one might expect the 
adaptation or selective perception to carry 
over to some extent from training to testing 
situations, thus leading S to be less likely 
to sample ambiguous cues than unam- 
biguous cues when they occur in test com- 
pounds. If, for example, S were more likely 
to sample cue e than cue a in test compound 
ae, one would expect his response probabili- 
ties to deviate from those predicted by the 
component model, presumably in the direc- 
tion of a higher probability of response 7 
than allowed for by the model. 

Thus our plan is first to evaluate the 
test data against predictions from the com- 
ponent model and then to use any signifi- 
cant deviations from component model pre- 
dictions as indicators of other perceptual or 
response processes not taken into account 
in the mixed model. 


METHOD 


Subjects. The 80 Ss were Indiana University 
undergraduate students who participated in this 
experiment to fulfill a requirement of their course 
in introductory psychology. 

Apparatus. The stimulus materials were repro- 
duced on 2 X 2 inch slides and projected by two 
random access projectors onto a matte screen. The 
sequence of slides projected was controlled by two 
banks of switches, one bank was used for each 
block of learning trials. Thus, the experimenter 


was able to program for two blocks in advance, 
Four booths were arranged in an arc so that Bs 
who sat in them could see the stimuli on the screen 
equally well. The booths were lined with acoustical 
tile on the sides and top, and contained desks in 
their rear parts. The space in the rear above the 
desk, bounded by sides and top, was open so that 
the projection screen could be seen. There was a 
hole in the center of each desk, and a roll of 
Esterline-Angus paper passed directly under the 
hole so that S could write his response on the pa- 
per. Movement of the paper was coordinated with 
stimulus presentation by a cam timer, 

Stimulus materials. Pairs of nonsense syllables 
were used as stimuli and numbers as responses, The 
basic list of syllables used to form the pairs con- 
sisted of vor, GAK, cyQ, ZIR, TEF, and FUH. These 
were taken from Archer’s (1960) listing and have 
association values between 31% and 38%. The 
number responses were the digits 1, 2, and 3. 

Procedure. There were two phases to the experi- 
ment, an acquisition phase and a testing phase. 
During acquisition, Ss learned syllable-number 
combinations by anticipation with correction. Stim- 
ulus exposure was 7 seconds, response exposure 3 
Seconds, intertrial interval 3 seconds and delay 
after warning buzzer 1 second. All times were 
within +5% tolerance. 

Each learning block consisted of eight paired 
associates, and the learning phase proceeded to a 
fixed number of eight blocks. Since each stimulus 
consisted of two nonsense syllables, both the right- 
left and left-right order of these syllables were 
presented in each block. Moreover, during the 
first block the two orders of each syllable com- 
bination were always presented on successive trials 
although the order of presentation was randomized 
over different pairs. Beginning with the second 
block, the order of occurrence of each stimulus and 
its associated response was determined completely. 
randomly for each even-numbered trial block. The 
reverse order of presentation to that of the imme- 
diately preceding block was used for each odd- 
numbered block. d 

A l-minute rest period separated these eight 
paired-associate blocks and the subsequent testing 
phase. The Ss were told: “In this phase of the ex- 
periment several new syllables will be presented. 
Each of these new syllables will be somewhat dif- 
ferent from the ones to which you learned to asso- 
ciate the numbers. Also, during this new phase of 
the experiment the syllables shown in the first part 
of the experiment will appear on the Screen, When 
these old syllables appear, their numbers will also 
appear just as in the first part of the experiment; 
however, when new syllables appear, no number 
will appear in association with them." They were 
further told to “respond to old syllables just as you 
did in the first part of the experiment” and to give 
that response to new syllables “which you think is 
most closely associated.” ^ 

The first part of the testing phase was simply 
another learning block of randomly ordered paired 
associates, in the same manner as the last eight 
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blocks. Following this block came further blocks 
of paired associates as before, but interspersed 
among the trials of these blocks were the test fig- 
ures consisting of novel compounds of the non- 
sense syllables shown earlier. These novel com- 
pounds were exposed for 7 seconds and no numbers 
were shown in association with them. There were 
12 such test compounds and these were inter- 
spersed randomly among the four blocks of paired 
associates which followed the initial block of the 
testing phase. The Ss were forced to respond to all 
test compounds. 

The use of the stimulus materials is illustrated 
in Table 1. The single letters in the syllable com- 
bination column refer to nonsense syllables which 
were assigned randomly to the letters from the 
pool listed above; thus ab might for a given group 
of four Ss be FUH aAk. Numbers were assigned ran- 
domly to the letters in the response column in a 
similar manner, so that ?, j, k, could take on the 
values 1, 2, or 3. 

The learning blocks and the number of test 
trials were actually twice as long as the number of 
entries in each column since, in both the training 
and the testing phases, the forward (e.g. ab) and 
the reverse (ba) of each syllable combination were 
shown to each S. 


RESULTS AND DISCUSSION 


A summary of the responses of all Ss in 
Experiment I to the test compounds may be 
seen in Table 2. The heading for the two 
columns of frequencies for a given response 
includes a designation of the stimulus com- 
binations with which that response was 
associated during training. The left-hand 
entry under each subspanner heading, under 
the column head marked “Both,” shows the 
number of responses to both left-right ar- 
rangements of the test figure given in the 
stub column. Thus, each of the 80 Ss con- 
tributes two tallies over a grouping of these 
entries, where a grouping refers to the three 
entries under “Both” in a single row. For 
example, in Table 2, 109 of the responses to 
the test figures ae and ea (both given to all 
Ss) were the numbers previously associated 
with ab, ba, de, and ed (coded as 1, which 
would be 1, 2, or 3 depending upon the out- 
come of the randomization); similarly 36 
of the responses to ae and ea were the num- 
ber associated with ac and ca, and 15 the 
number associated with df and fd. 

The right-hand entry under each sub- 
spanner heading (column headed “First”) 
gives the number of times the indicated re- 
sponse was given when the test figure shown 


TABLE 2 


Test RESPONSES OF ALL SUBJECTS IN 
ExPERIMENT I 


Response (and associated training cues) 
i(ab-de) k(df) 
Both First Both First 


a A 


Both First 


ae 109 53 36 18 15 9 
be 128 62 16 8 16 10 
ad 76 4 50 19 34 20 
dc 52 23 84 43 24 14 
bc 57 28 88 45 15 7 
cf 16 5 6l 32 83 43 


to the extreme left in the given row (or its 
reverse) occurred the first time in the test- 
ing phase. For this purpose, ae and ea, for 
example, are considered equivalent, and 
only the one of these which occurred first 
was included in this tabulation. 

Since, in all phases of all experiments, 
each stimulus combination involved was 
presented to each S in both left-right ar- 
rangements (ae and ea, etc.), we shall avoid 
circumlocution throughout the remainder of 
this presentation by using a single label 
(e.g., ae) to represent both arrangements of 
a pair of cues except when a distinction be- 
tween the two arrangements is specifically 
intended. 

Since there is ample evidence that the 
phenomena associated with choice of re- 
sponse to ambiguous cues are functions of 
the level of previous learning achieved, 
separate analyses were made on the basis of 
grouping by error rates over asymptotic 
trials. Using the distribution of errors to the 
original syllable compounds which were 
made during the testing phase, Ss were di- 
vided into two groups on the basis of num- 
ber of errors so that as nearly as possible 
50% were in the high group and 50% in the 
low group. This division led to a grouping 
of 41 with four or fewer errors (the numbers 
of Ss with 0, 1, 2, 3, and 4 errors, respec- 
tively, being 13, 10, 9, 4 and 5) and 39 with 
five or more errors (range 5-27). Table 3 is 
like Table 2 except that the upper part of 
Table 3 contains only the data for Ss with 
four or fewer errors and the lower part of 
Table 3 the data for Ss with five or more 
errors. 

In order to compare the pattern of test 
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TABLE 3 


Tzst Responses or HiaH- anD Low-EnROR 
SUBJECTS IN ExPERIMENT I 


Response (and associated training cues) 


RS Hade) jao) [77 
Both First ^ Both First Both First 
Low-error Ss 
ae 67 33 12 6 3 2 
be 74 36 5 3 3 2 
ad 45 23 22 10 15 8 
dc 19 9 53 28 10 4 
be 28 15 50 24 4 2 
cf 4 1 34 17 44 23 


High-error Ss , 
ae 42 20 


24 7 
be 54 26 11 5 13 8 
ad 31 18 28 9 19 12 
dc 33 1 31 15 14 10 
bc 29 13 388 21 11 5 
cf 12 4 27 15 39 20 


results with that predicted by the compo- 
nent model, we have first calculated a 
priori theoretical values on the assumption 
that test behavior was strictly a function of 
the combination of cues presented. Follow- 
ing the procedure illustrated in the intro- 
duction to this experiment, we obtained 
theoretical response proportions, then con- 
verted these to the theoretical frequencies 
exhibited in Table 4. The values in the left- 
hand portion of the table are to be com- 
pared with those given in the columns 
labeled “Both” in Table 2, and those in the 
right-hand portion with those given under 
“Both” in the upper half of Table 3. 

It is apparent at a glance, firstly, that the 
gross patterns of observed values are re- 
flected in the a priori predictions, and 
secondly, that there are some substantial 
quantitative disparities. Among the latter, 
the most annoying, from the standpoint of 
our present purposes, are the instances of 
rather large frequencies in cells for which 
zeros are predicted. These may be termed 
cases of “inappropriate responding,” for 
the expectation that the cells in question 
should be empty follows, not only from the 
component model, but from any theory 
which assumes test behavior to be deter- 
mined by the previous learning experiences. 
Further analysis shows that the greater 


part of the inappropriate responding is at- 
tributable to a number of high-error Ss who 
evidently respond essentially at random 
on test trials. Thus we shall confine the re- 
mainder of this analysis to the low-error 
Ss, for whom the frequencies of inappro- 
priate responding are relatively small. 

To obtain the best baseline from which to 
evaluate deviations from component model 
predictions in the case of the low-error Ss, 
we must correct the predictions given in 
Table 4 for the observed level of inappro- 
priate responding. In effect, we assume that 
on some proportion p of test trials, Ss’ be- 
havior was determined by the cues pre- 
sented and on the remaining proportion 
1 — p by irrelevant aspects of the test situa- 
tion. To estimate p, we set up the observa- 
tion equations 


79/82 = p + (1 — p)% 

74/82 = p + (1 — p)% 
and 

78/82 = p + (1 — p)% 


on the basis of the data for tests on ae, be, 
be, and cf (the last two being identical), 
sum both sides of these equations, and solve 
for p, obtaining 


p — .861. 


This estimate was used to compute the 
adjusted theoretical frequencies for the 
low-error Ss shown in Table 5. Test figures 
be, bc, and cf, in which each component cue 
had been associated with only one response 
during training, and figure ad, in which each 
cue had been associated with two different 


TABLE 4 


A Priort Prepictions or Test RESPONSES IN 
EXPERIMENT I FROM COMPONENT MODEL 


Response 

bus All subjects Low-error subjects 

i j k i j k 

ae 120 40 0 61.5 20.5 0 

be 160 0 0 82 o0 0 
ad 80 40 40 41 20.5 20.5 
de 40 80 40 20.5 41 20.5 

be 80 80 0 41 4l 0 

cf 0 80 80 0 41 41 
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TABLE 5 
ADJUSTED Component MODEL PREDICTIONS or Tust RESPONSES ror Low-Ennon SUBJECTS 
IN EXPERIMENT I 


Response 
Test figure i j k 
Theoretical Observed ‘Theoretical Observed Theoretical Observed 
ae 56.7 67 21.4 12 3.9 3 
be 74.4 74 3.8 5 3.8 3 
ad 39.2 45 21.4 22 21.4 15 
dc 21.4 19 39.2 58 21.4 10 
bc 39.1 28 39.1 50 3.8 4 
cf 3.8 4 39.1 34 39.1 44 


responses, clearly yield no significant devia- 
tions from the component model predictions. 

Test figure bc provides an opportunity to 
assess any effect of response frequency (re- 
gardless of stimulus) during training, since 
the response associated with b was rein- 
forced twice as often on training trials as 
the response associated with c. However, 
there proves to be no trace of any preference 
for the former response on bc test trials, the 
observed deviation from equality, though 
probably insignificant (see below), being in 
the opposite direction. 

For the two figures, ae and de, in which 
one cue had been associated with a single 
response and one with two responses during 
training, further analysis is required. In 
each of these instances, the frequency of the 
response associated with the “unambiguous” 
cue (ie. the one with a single reinforced 
response during training) was appreciably 
in excess of the component model prediction. 
Since each S contributed two responses to 
each test figure, a standard statistical test 
of these discrepancies, assuming Bernoulli 
trials, is not appropriate. We can, however, 
use such a test on the data for the "First" 
columns of Table 3, with the component 
model frequencies in Table 5 all divided by 
2. The x? of 2.84 computed on this basis 
for figure ae is not significant at the .05 level 
(and the same is true for be, bc, cf, and ad). 
For figure de, the x? of 8.06 has a probability 
between .01 and .02 with 2 dj. i 

The fact that test responses deviate sig- 
nificantly from the component model pat- 
terns for de but not for ae is of some interest. 
In the former case, one might characterize 
the deviation in terms of a preference for the 


less ambiguous, or more valid, cue in the 
test figure, either of which seems intuitively 
reasonable enough. But in the case of ae, 
the same preference should be operating, 
and one would think it should be enhanced 
since response 7 had been associated with 
both the test cues during training. However, 
not too much can be made of this disparity 
since the two x? values are close to, though 
on opposite sides of, the boundary of the 
05 critical region. Moreover, the power of 
x? (defined exactly at the limit and approxi- 
mately for fixed sample size) is a function 
of degrees of freedom. 

We next proceed to explore somewhat 
different experimental paradigms, in which 
any tendencies toward selective sampling of 
test cues on the basis of ambiguity or 
validity might be expected to show up more 
clearly. 


Experment II 


A new arrangement of stimulus-response 
relationship, summarized in Table 6, was 
contrived to permit any preference for 
more valid, or less ambiguous, cues to oper- 
ate on test trials, while eliminating the 
possibility that some of the cues involved in 


TABLE 6 
Dusan or ExPERIMENT IT 


see Testin 
syllable 
e eris Responses combinations 
ab i ad 
ac j bd 
bc k cd 
dz D 
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the transfer tests might have been selec- 
tively ignored during training. The design 
for the training phase included the first two 
stimulus-response pairs of Experiment I, to- 
gether with an item in which the nonover- 
lapping cues of the first two combinations 
were paired with a new response. Clearly, 
Ss could not reach perfect performance on 
these three items if they systematically 
avoided sampling any of the cues, and hence 
we could assume that by the end of training 
associations would necessarily have been 
formed between cue a and responses i and 
j; eue b and responses 7 and k; and cue c 
and responses j and k; all of these cues being 
equal with respect to ambiguity and valid- 
ity. The fourth training combination com- 
bined a cue d with a cue, denoted by z in 
Table 6, which was different from one pres- 
entation of the combination to the next. 
The consistent component, d, necessarily 
associated with the response | by the end 
of training, was paired on test trials with 
each of the cues from the other three com- 
binations. If total response frequencies 
proved to deviate significantly from com- 
ponent model predietions in the same direc- 
tion observed in the case of test combina- 
tions ae and dc of Experiment I, we hoped 
to be able to conclude that the factor re- 
sponsible was differential cue ambiguity, or 
validity, per se, and not a difference in the 
consistency with which more or less am- 
biguous cues had been sampled by Ss on 
training trials. 


MzrHoD 


Subjects. Sixty-two Ss, all students of introduc- 
tory psychology, were used. 

Apparatus. The stimuli were programmed and 
presented as in Experiment I, and booths were 
identical to those described above except for the 
surfaces of the desks. Instead of the hole, each 
desk contained a box with eight Plexiglas windows 
and a push-button below each of these windows. 
There was also a small light between each button 
and window to provide a signal for the S that his 
response had registered in the apparatus. The but- 
ton-box assembly covered the writing hole since 
the Esterline-Angus paper was not used. The but- 
tons pushed by Ss were recorded on paper tape by 
a Friden punch. 

The possible responses were lettered on the 
Plexiglas windows, behind which were a series of 
bulbs. The bulbs served to indicate the periods 


during which the Ss could respond and were under 
the control of the cam-timer. 

Stimulus materials. As before, pairs of nonsense 
syllables were used as stimuli and numbers as re- 
sponses. There were two basic lists of syllables: 
The first, List A, consisted of vor, GAK, CYQ, ZIR, 
TEF, and FUH, and the second, List B, of qan, BYM, 
KEC, YIN, WOX, and cuv. (All had 31-38% associa- 
tion value according to Archer, 1960). The re- 
sponses were the digits 1, 2, 3, 4. 

Procedure. There were again acquisition and 
testing phases, with sequencing, timing, instruc- 
tions, and method of presentation as in Experi- 
ment I. Each learning block consisted of eight 
paired-associate presentations, and training pro- 
ceeded for eight blocks. Then six test trials were 
given, randomly interspersed among two blocks of 
training trials which followed an initial retraining 
block in the testing phase. A restriction was im- 
posed upon this random interspersion such that 
exactly four tests occurred in the first and two in 
the second of the final two blocks of the testing 
phase. The purpose of this arrangement was to en- 
sure that the state of learning of cue-response rela- 
tionships would not suffer any appreciable reten- 
tion loss over the test series. 

The letter z, appearing in the bottom row of 
Table 6, differed in status from the other letters in 
that the particular nonsense syllable substituted 
for it varied from block to block. If rua was ran- 
domly assigned to a, for example, FUH occurred 
for a in every block of training trials; but in the 
case of z, a group of syllables was assigned, one of 
these occurring in each block and all occurring dur- 
ing the course of learning. 

The syllables from List A were randomly as- 
signed to the letters, a, b, c, d, and the syllables 
from List B were assigned to z. Five of the sylla- 
bles from List B were so assigned for any one 8. 
Since there were 11 blocks of pairings in all in 
Experiment II (8 during acquisition and 3 during 
testing), four of these z's were repeated once and 
one was repeated twice. The digits from 1 to 4 were 
randomly assigned to 1, j, k, and l. 

Unlike Experiment I, the nature of response re- 
cording made it necessary to permit a failure to 
respond to test compounds. 


RESULTS AND DISCUSSION 


The summary of test responding given in 
Table 7 is like that in Table 2 except for 
the addition of a “no response" column. 
Each entry in this last column gives the 
total number of times Ss failed to respond 
when the test figure of that row was shown. 
On the basis of the distribution of errors 
made to the training figures during the test- 
ing phase, a low-error group of 32 Ss and a 
high-error group of 30 Ss were defined, the 
range for the former being 0-7 and for the 


TRANSFER AS A FUNCTION OF FREQUENCY 9 


TABLE 7 
Test RESPONSES OF ALL SUBJECTS IN EXPERIMENT Il 


Response (and associated training cues) 


Test figure i(ab) (ac) (be) l(dz) No response 
Both First Both First Both First Both First Both First 

ad il 7 8 3 6 2 96 48 3 2 

bd 9 5 8 2 11 3 90 49 6 3 

cd 6 3 6 3 18 9 89 46 5 1 

Pooled 26 15 22 8 35 14 275 143 14 6 


latter 8-17. The breakdown of test responses 
by error group is given in Table 8. 

Since cues a, b, and c are symmetrical in 
the logical structure of the design (i.e. each 
having two associated responses during 
training), it is appropriate to pool the fre- 
quencies in each response category of Tables 
7 and 8 over the three test figures. When 
this is done, we observe that of 372 re- 
sponses (including failures) given by all 
Ss in the Both category, 275, or 74%, were 
instances of the response associated with 
cue d during training; the corresponding 
percentages for low- and high-error Ss were 
82 and 66, respectively. 

Since according to the component model 
only 50% of the test responses should have 
been instances of the response associated 
with cue d, it is clear that under the condi- 
tions of this experiment, Ss’ preference for 
the relatively unambiguous cue was much 


more pronounced than in Experiment I. 
Owing to the design of the training series, 
this preference presumably could not be 
the result of Ss having learned to ignore the 
more ambiguous cues, However, in Experi- 
ment II, the unambiguous cue, d, differed 
from the other test cues also in that it had 
not been part of a regularly recurring pat- 
tern during training. Thus the possibility is 
suggested that Ss were, in effect, reinforced 
for responding to cue d whenever it ap- 
peared in a new compound. The next ex- 
periment is designed to rule out any such 
eontingency. 


ExPERIMENT III 


The design of this experiment, summa- 
rized in Table 9 ensures that, as in Experi- 
ment IT, Ss cannot be reinforced for selec- 
tively ignoring any of the test cues during 
training. Further, in the new paradigm, all 


TABLE 8 
Tzsr Rusponses or Low- AND HicH-ERROR SUBJECTS IN ExPERIMENT II 


Response (and associated training cues) 


Test figure (ab) jac) (bc) (dx) No response 
Both First Both First Both First Both First Both First 
Low- S 
cat s 3 3 2 1 3 2 55 25 1 1 
bd 4 4 i 0 5 4d 51 25 3 2 
cd 1 1 5 2 6 3 51 25 1 1 
Pooled 8 8 8 3 14 6 157 75 5 4 
High- 
ee ^" 8 4 6 2 3 0 41 23 2 1 
bd 5 1 7 2 6 2 39 24 3 1 
cd 5 2 1 1 12 6 38 21 4 0 
Pooled 18 7 14 5 21 8 118 68 9 2 
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TABLE 9 
Dusien or ExPERIMENT III 


Training eem 
Syllable syllable 
combinations Responses combinations 
ab i af 
ac 4 ae 
ad k ef 
eb l bc 
ed m ce 
fe n cd 
bf 
df 


cues are components of regularly recurring 
patterns during training, and thus all should 
be equally likely to be sampled when they 
appear in test compounds. It was our in- 
tention that the only differentiation among 
the cues which might prove relevant to test 
behavior would occur along the dimension 
of ambiguity, or validity. To permit more 
thorough study of this variable, the rela- 
tions between cues and reinforced re- 
sponses were such that one cue (a) would 
be associated with three different responses, 
four cues (b, c, d, and e) with two different 
responses each, and one cue (f) with a sin- 
gle response. Then cues at each level of 
ambiguity were paired with cues at each of 
the other levels during the testing phase. 


Metuop 


Subjects. A total of 61 Ss was used, all recruited 
from sections of the introductory course. 

Apparatus. The arrangement of booths and the 
surfaces of the desks were as in Experiment II, but 
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the projector-screen method of stimulus presenta- 
tion and the switch-bank method of programming 
were not used. Instead, the stimuli were presented 
by means of digital display units manufactured by 
Industrial Electronics, Incorporated. Each such 
unit is capable of displaying any 1 of 12 different 
stimuli by having a bulb turned on behind a film 
containing the stimulus of interest; a lens arrange- 
ment bends the light so that all stimuli fall in the 
same location on the viewing surface. The se- 
quence of stimulus events was programmed by a 
Friden tape reader and Ss’ responses were recorded 
by use of the Friden punch, as in Experiment II, 

Procedure. The overall procedure discussed un- 
der Experiment I and Experiment II was followed 
with the schema shown in Table 9. There were 12 
paired associates in each acquisition block and a 
total of eight acquisition blocks. 

Another training block was given during the 
first part of the testing phase and then four more 
blocks with interspersed test compounds. There 
were 16 such tests and these were randomly inter- 
spersed under the restriction of exactly four tests 
per block, The syllables from List A (Experiment 
II) were randomly assigned to the letters a through 
Í (Table 9) and the digits 1-6 to the letters 
i through n. 


RESULTS AND DISCUSSION 


Test responses for all Ss are summarized 
in Table 10, and for low-error and high- 
error Ss separately in Tables 11 and 12, 
respectively. The ranges of errors on train- 
ing figures during the testing phase were 
3-20 for the 30 low-error Ss and 21-58 for 
the 31 high-error Ss. 

There is a total of 49 response failures 
among the 976 tests for all Ss, giving a rela- 
tive frequeney of .050. The corresponding 
relative frequeney for the low-error Ss is 
.033. As can be seen in Tables 10 and 11, 


TABLE 10 
Test RESPONSES OF Aut SUBJECTS IN EXPERIMENT III 
Response (and associated training cues) 
Test figure| i(ab) (ac) k(ad) Ileb) med) n(fc) No response 
Both | First | Both | First | Both | First | Both | First | Both | First | Both | First | Both | First 
af 15 6 20 10 12 6 5 2 5 2 57 29 8 6 
ae 27 12 16 9 12 8 27 14 27 14 5 2 8 2 
ef 7 5 10 5 pu 5 26 14 14 6 49 23 5 3 
bc 24 10 18 8 17 9 24 11 7 5 23 12 9 6 
ce 10 4 20 9 7 4 23 13 25 14 31 13 6 4 
cd 9 5 15 10 29 16 7 5 24 9 30 12 8 4 
bf 21 15 8 2 10 5 15 «f 9 5 56 26 3 1 
df 6 2 3 2 22 11 9 6 18 8 62 30 2 2 
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TABLE 11 
Trest Responses or Low-ERROR SUBJECTS IN EXPERIMENT IIT 


Response (and associated training cues) 


Test figure| (ab) jac) klad) 1(eb) md) n(fo) No response 
Both | First | Both | First | Both | First | Both | First | Both | First | Both | First | Both | First 
af 9 3 9 4 3 2 1 0 1 1 35 18 2 2 
ae 8 4 9 = 4 3 19 9 18 9 0 0 2 1 
ef 1 1 5 3 4 2 13 6 4 2 32 15 1 1 
be 16 6 10 5 8 4 8 4 2 2 13 7 3 2 
ce 4 1 9 4 3 2 14 7 14 8 15 7 1 1 
cd 3 2 5 3 20 13 2 1 9 4 17 5 4 2 
bf 14 9 L 0 3 2 5 2 2 1 33 15 2 1 
df 1 1 0 0 17 10 2 2 10 4 29 12 1 1 
the distribution of response failures over I-L : ef, bf, df 
t i rh: 
test figures is rectangular, except perhaps ET oe to cd 


for the compounds in which f is paired with 
a cue of intermediate ambiguity (e, b, d). 
But even in such cases the slight differences 
would make it reasonable to assume, when 
testing component model predictions, that 
the likelihood of response failure is ade- 
quately represented by a single probability 
common to all test figures for a given S 
grouping. 

For convenience in discussing the various 
tests, we shall refer to a as the high-ambi- 
guity (H) cue, f as the low-ambiguity (L) 
cue, and b, c, d, and e as intermediate 
ambiguity (I) cues. Test figures involving 
the same ambiguity combinations will be 
grouped together for analysis: 


H-L : af 
H-I : ae 


Further, we shall designate appropriate re- 
sponses to a test figure (those reinforced to 
either component during training) by the 
letter A, other responses by O, and response 
failures by N. Within the A class, + and — 
will denote the responses associated with 
the less and more ambiguous cues, re- 
spectively. 

We may quickly dispose of the I-I cate- 
gory to which little theoretical interest 
attaches. As expected, the division of A re- 
sponses between the two cues was approxi- 
mately equal, 48%, over all Ss, being in- 
stances of j or n, the responses associated 
with cue c. For some reason not obvious to 
us, within this class, the Ss showed a rather 
marked preference (84 n’s to 53 j's) for the 
response associated with the less ambiguous 


TABLE 12 
Test Responses or HIGH-ERROR SUBJECTS IN ExPERIMENT IlI 


Response (and associated training cues) 


Test figure (ab) ja) k(ad) Ileb) m(ed) n(fe) No response 
Both | First | Both | First | Both First | Both | First | Both | First | Both First | Both | First 

af 6 3 11 6 9 4 4 2 4 1 22 il 6 4 

ae 19 8 7 5 8 5 8 5 9 5 5 2 6 1 

ef 6 4 5 2 7 3 13 8 10 4 17 8 4 2 

be 8 4 8 3 B 5 16 7 5 3 10 5 6 4 

ce 6 3 il 5 4 2 9 6 11 6 16 6 5 3 

cd 6 3 10 ff 9 3 5 4 15 5 13 T 4 2 

bf 7 6 7 2 7 3 10 5 7 4 23 11 1 0 

df 5 1 3 2 $ 1 7 4 8 4 33 18 1 1 
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of the two cues paired with c during train- 
ing. 
Considering now the A responses made 
by all Ss on tests involving differential 
ambiguities, for the H-L tests 5596 of these 
responses were in the A+ category, for H-I, 
50%, and for I-L, 59%. Corresponding 
values for the low-error Ss were 62%, 64%, 
and 6095, respectively. There would, thus, 
seem to be some preference for the less 
ambiguous cues, particularly among the 
low-error Ss. However, these percentages are 
not directly comparable with each other 
or with an expectation of 50% since an a 
priori expectation of 50%+ and 5095— re- 
sponses within the A category is based on 
the assumption that all responses are de- 
termined by Ss’ learning histories relative 
to the test cues. Since there were, in fact, 
incidences of inappropriate responses and 
response failures, adjusted theoretical val- 
ues for the component model must be com- 
puted. 

Except for the necessity of dealing with 
the N category, the analysis proceeds as in 
the case of Experiment I. For the data of all 
Ss, in the Both category, we have already 
estimated the probability of a response 
failure on any test, py , to be .050. Letting 
p denote probability of sampling the rele- 
vant cues on test trials, as in Experiment I, 
we can set up the observation equations: 


For H-L 
852 = .950 [p + (1 — p)2/3], 
for H-I 
898 = .950 [p + (1 — p)5/6], 
for I-L 
773 = 950 [p + (1 — p)1/2], 
and for I-I 
.781 = .950 [p + (1 — p)2/3]. 


In each instance, the quantity on the left is 
the observed proportion of A responses and 
the factor .950 on the right is 1 — py. The 
multiplier of (1 — p) in each equation is 
the probability that a response occurring by 
chance (i.e., not determined by the test cues) 
will fall in the A category. Adding these 


equations (with proper weighting) and solv- 
ing for p, we obtain the estimate f = .581, 
and using this value we arrive at the com- 
ponent model predictions for the differential 
ambiguity tests. 


H-L : P(A+) 

= 950 [.581/2 + .419/6] = .342 
H-I : P(A+) 

= 950 [.581/2 + .419/3] = .408 
I-L: P(A+) 

= .950 [581/2 + .419/6] = .342, 


to be compared with corresponding ob- 
served values of .467, .442, and .456, re- 
spectively. Thus, the deviations from the 
component model occur in the expected 
order from largest to smallest: H-L, I-L, 
H-I. 

A similar analysis for low-error Ss only 
yields a much larger estimate of the proba- 
bility of sampling the relevant cues on test 
trials, Ĥ = .751, and component model pre- 
dictions of 


H-L: P(A+) = .404 
H-I: P(A+) = .445 
I-L: P(A+) = .404, 


to be compared with observed proportions 
of .583, .617, and .522, respectively. The 
order in this case is: H-L, H-I, I-L, with 
little difference between the first two. 

It is difficult to support all of the com- 
parisons of interest with formal significance 
tests, owing to the multiple observations 
per S. However, x? (on the conditional 
space) values for the comparison of A+ and 
A— frequencies in the First data with com- 
ponent model predictions are significant at 
the .05 level or beyond for H-L and I-L 
comparisons. Considering also the uniform- 
ity of the deviations from the component 
model base line, one can scarcely avoid con- 
cluding that our Ss tended to base their test 
responses on the less ambiguous cue in the 
test figure. Comparison of these results with 
those of the Experiment II suggests that 
this preference is augmented when the train- 
ing conditions provide differential rein- 
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forcement for responding to an unambigu- 
ous cue, but that the preference may be 
substantial even in the absence of such 
contingencies. 


Experiment IV 


It will be recalled that the test series of 
Experiment I included a compound ad, of 
which cue a had been associated with re- 
sponses i and j, and cue d with responses 
i and k during training. We had anticipated, 
on intuitive rather than formal theoretical 
grounds, that Ss would exhibit a differential 
preference for the common response, ?, on 
the ad test. Something of the sort appears to 
occur in many nonlaboratory situations. 
If, for example, a patient has several symp- 
toms, a, b, c, +++, each of which is asso- 
ciated with two or more different diseases 
but all consistent with some one disease X 
(e.g., a is associated with X and Y, b with 
X and W, C with X and Z ---) one would 
surely expect a diagnostician to select X 
with high probability. Yet little evidence 
for any such effect was manifest in the data 
of Experiment I. 

One possible interpretation is that selec- 
tive response on the basis of a principle of 
response communality, or confirmation, is 
not a general characteristic of discrimina- 
tion learning, but must be established in 
particular situations by special instructions 
or training. However, it is possible also that 
the conditions of Experiment I were un- 
favorable to manifestation of a response 
confirmation principle, perhaps because 
cues a and d were in a sense irrelevant to 
the discriminations learned during training. 
Thus we propose in the present experiment 
to arrange a more thorough test, using a 
training paradigm completely balanced with 
respect to cue validity and response fre- 
quency. 


MzTHOD 


Subjects and apparatus. Ninety-three Ss from 
the introductory psychology course were run in 
the same apparatus used in Experiment III. 

Procedure. 'The training list included 16 paired- 
associate items, each presented once per block. 
The stimuli were syllable pairs conforming to the 
paradigm of Table 13 (with each pair appearing 
in both left-right orders). Stimulus lists A and B 
of Experiment II were pooled and eight syllables 


TABLE 13 
Desiran or Exprriment IV 


Training 


Testing 
CAT Responses combinations 
ab i ad 
Eo j bc 
db k ef 
dc i gh 
eg l 
eh j 
Jg k 
fh l 


selected for random assignment to the positions 
denoted by the letters a-h in Table 13. Responses 
were the digits 1, 2, 3, and 4, randomly assigned 
to the letters i-l in Table 13. The pairing of sylla- 
ble combinations with responses was such that the 
list could not be completely learned if any of the 
cues were selectively ignored during training, Fur- 
ther, each component cue was associated with 
exactly two responses during training, and the test 
combinations were so chosen that each test pro- 
vided an opportunity for operation of any tend- 
ency to respond on the basis of response com- 
munality. 

The learning phase comprised 12 blocks, with 
the 16 items occurring in random order in each 
block. At the start of the testing phase, one addi- 
tional training block was given; then the eight 
test compounds were interspersed through two 
training blocks, the positions being random except 
that exactly four tests occurred in each block. 


RESULTS AND DISCUSSION 


Full-test data are given for all Ss in 
Table 14 and for low- and high-error sub- 
groups (with 47 and 46 Ss, respectively) 
in Table 15. Since all test compounds are 
symmetrical with respect to the experimen- 
tal design, the data can conveniently be 
pooled, using theoretically significant re- 
sponse categories as in Experiment III. In 
this instance, the category A+ will denote 
the appropriate response to a test compound 
which is “correct” according to the notion 
of response communality (response i to ad, 
l to ef, etc.), A— any other appropriate re- 
sponse, O an inappropriate response, and 
N a response failure. 

Taking first the pooled frequencies for 
all Ss shown in the upper portion of Table 
16, we obtain directly px — 35/744 — .047 
as our estimate of the probability of a re- 
sponse failure on any trial. Then, entering 
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TABLE 14 
Tzsr Responses or ALL SUBJECTS IN ExPERIMENT IV 


Response (and associated training cues) 


set i(eb-de) j (ac-eh) B(db-fa) Ues-fh) No response 
Both First Both First Both First Both First Both First 
ad 95 47 Re aik x m 9 4 LO le 
bc 75 87 51 25 4 AL 11 4 9 6 
ef 1252169 30 17 49 — 25 87 838 et 
gh TOR 40 20 38. 15 87 45 ah SMS 


the appropriate values in 
P(A) = (1 — py)[p + (1 — »)8/4], 
we obtain 


662 _ - 
zag = 99-5 p) 


which yields the estimate p = .736 for the 
probability that any test response is deter- 
mined by the relevant cues. Using these 
estimates, we have computed the component 
model predictions given in the upper por- 
tion of Table 16. The same analysis for 
low-error Ss yields the parameter estimates 
pu = .056, p = .844, and the component 
model predictions presented in the lower 
portion of Table 16, 

On the basis of the comparisons planned 
in advance of the experiment, ie., those 
available in Table 16, we must conclude that 
there are only slight and statistically in- 
significant deviations from the component 


model base line in the direction of a prefer- 
ence for the A+ category. On an a pos- 
teriori basis, however, an interesting ob- 
servation emerges regarding the trend over 
the test series. Since, as may be seen in 
Table 16, the response frequencies in the 
First category fell notably closer to com- 
ponent model predictions than did those in 
the Both category, the Second test responses 
must have deviated further. Examining the 
Second eategory for all Ss (obtainable by 
subtraction in Table 16) we find 177 A+ 
responses compared to 154 predicted; and, 
for the observed theoretical comparison 
over the A+ and A— entries only, we ob- 
tain a x? of 5.82 which is significant at the 
.02 level. A similar analysis for low error 
Ss yields a x? of 5.93. Thus, although Ss’ be- 
havior on early tests conforms closely to 
the component model, it is possible that a 
tendeney to respond in terms of response 
communality develops as a function of con- 
tinued experience with admixed training and 
test trials. 


TABLE 15 
Testr Responses Or HraH- AND Low-Error SUBJECTS IN EXPERIMENT IV 


Response (and associated training cues) 


Au i(ab-de) ‘j(ac-eh) A (db-fg) Ves-fh) No response 
Both First Both First Both First Both First. Both First 
Low-error Ss 
ad 51 28 19 8 20 10 1 d 3 0 
bc 37 16 29 16 19 9 3 H 6 5 
ef 3 3 16 10 22 13 47 18 6 3 
gh 7 3 18 n 14 6 49 23 6 4 
High-error Ss 
ad 44 19 19 7 17 13 8 3 4 4 
be 38 21 22 9 21 12 8 3 3 1 
ef 9 6 14 7 27 12 40 20 2 1 
gh 8 5 22 9 19 9 38 22 5 1 


| 
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TABLE 16 


PoonEp Test Responses or EXPERIMENT IV COMPARED WITH PREDICTIONS FROM 
Component MODEL 


Response type 
Category At K=- o N 
Observed Theon Observed Theo; Observed Theo, Observed „Theo; 

All Ss 

Both 344 308 318 354 47 47 35 35 

First 167 154 161 177 25 23 19 18 
Low-error Ss 

Both 184 164 157 177 14 14 21 21 

First 85 82 83 88 8 7 12 11 


EXPERIMENT V 


In virtually all studies of discrimination 
learning conducted with the classical (ab 
— 1; ac — 2) design, and in the first three 
experiments of the present series, the factor 
of eue ambiguity, or validity, has been un- 
obtrusively but ubiquitously confounded 
with stimulus frequency. Consider, for ex- 
ample, the training paradigm of Experiment 
III (Table 9). Cue f, which proved to be 
the most potent controller of responding on 
transfer tests, occurred only once per train- 
ing block, whereas the most ambiguous cue, 
a, occurred three times, and those of inter- 
mediate ambiguity twice per block. 

As a step toward disentangling this con- 
founding, we propose in this experiment to 
maintain the same arrangements of train- 
ing cues and responses as in Experiment 
III, and to use the same transfer tests, but 
to change the relative frequencies of oc- 
currence of some of the stimulus-response 
combinations during training. Thus we shall 
be able to compare test, responding to cues 
which differ in ambiguity with training 
frequencies equated; and to compare cues 
which are of similar ambiguity but differ 
with respect to training frequency. 


METHOD 


Subjects and apparatus. The 144 Ss all were 
students in introductory psychology. Apparatus 
was the same as that of Experiment III. 

Procedure. Training conditions are summarized 
in Table 17. The new feature is that, whereas in 
Experiment III each syllable combination and its 
associated response occurred once per training 
block, in the present experiment some combina- 


tions occurred more often than others. For Con- 
ditions F— (69 Ss), and F-- (75 Ss) there were 
20 and 26 trials, respectively, per training block, 
as compared to 12 in Experiment III, each left- 
right arrangement of syllable combinations being 
assigned the frequency per block shown in Table 
17. In Condition F—, any effects of training fre- 
quency should be reduced for tests involving the 
least ambiguous cue, f, whereas in Condition F+ 
they should be accentuated. The F— condition 
will provide instances both of tests with differential 
ambiguity but equal training frequency (e.g., ef) 
and of tests with similar ambiguity but unequal 
training frequency (e.g, bc). The F+ condition 
will provide tests of the second type (e.g. ce) and 
also tests on which effects of training frequency 
and ambiguity are strongly confounded (e.g., af). 

Training and testing were conducted as in Ex- 
periment III, except for the variations in makeup 
of training blocks. In the first training block each 
syllable pair appeared once, the two left-right 
arrangements occurring back-to-back. Then the 
differential stimulus frequencies shown in Table 
17 applied throughout the subsequent eight train- 
ing blocks, and throughout the four additional 
blocks of the testing phase, in each of which four 
test trials were interspersed among the training 
trials. In all blocks after the first, the order of 
stimulus presentation was entirely random. The 


TABLE 17 

TRAINING CONDITIONS or EXPERIMENT V 

Frequency oF slats 
Syllable R pa me 

ini m edis Condition Condition 

F- F+ 

ab i 4 4 

ac j 1 4 

ad k 1 2 

eb l 1 1 

ed m 1 1 

fe n 2 1 
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syllables of List A (Experiment II) were assigned 
to the letters a through f in Table 17 and the 
digits 1-6 to the letters through n. 


RESULTS AND Discussion 


Frequencies of test responses following 
the F— training condition are summarized 
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groups, so only the combined data for all 
Ss will be used in subsequent discussion and 
comparisons. 

Direct comparisions of F— and F+ con- 
ditions with each other and with Experi- 
ment III (which might be denoted as an 
F= condition) ean most conveniently begin 


TABLE 18 
Testr RESPONSES or ALL SUBJECTS IN Experiment V, CoNprTION F— 
Response (and associated training cues) 
‘Test figure i(ab) (ac) (ad) U(eb) m(ed) no) No response 
Both | First | Both | First | Both | First | Both | First | Both | First | Both | First | Both | First 
af 1 9 27 14 23 8 9 4 4 2 49 28 15 4 
ae 16 8 13 4 16 9 52 25 23 12 4 4 14 " 
ef 2 1 7 4 13 9 37 19 26 12 44 19 9 5 
bc 21 1i 35 17 11 3 18 10 7 4 33 15 13 9 
ce 6 8 25 12 10 5 36 16 24 14 29 16 8 3 
cd 2 1 26 14 31 15 16 9 30 13 26 13 7 4 
bf 26 16 17 5 11 4 17 11 4 3 54 25 9 5 
df 3 2 4 2 32 13 9 4 27 17 48 22 15 9 


in Table 18 and those following the F+ con- 
dition in Table 19. Similarly, Tables 20 and 
21 show the frequencies of test responses 
for the low-error Ss (34 in F— and 37 in 
F+) while Tables 22 and 23 show these 
frequencies for high-error Ss (35 in F— and 
38 in F+). As usual, deviations from the 
component model base line, and, in particu- 
lar, differences between A+ and A— fre- 
quencies, were greater for low-error than for 
high-error Ss. However, all of the principal 
trends to be discussed hold for both sub- 


by reference to Table 24. Here, test re- 
sponse frequencies over all Ss in Experiment 
III and in both conditions of Experiment 
V are shown jointly in percentage form 
(for the Both category). To facilitate theo- 
retical comparison, the data have been com- 
bined further in Table 25, where test re- 
sponse percentages for each of the response 
types defined in the analysis of Experiment 
III are given. The pooling in Table 25 is 
over all figures representing each differen- 
tial ambiguity combination. 


TABLE 19 
Test RESPONSES OF ALL SUBJECTS IN EXPERIMENT V, Conpirion F+ 
Response (and associated training cues) 
Test figure (ab) (ac) R(ad) 1(eb) m(ed) n(fe) No response 
Both | First | Both | First | Both | First | Both | First | Both | First | Both | First | Both | First 
af 10 6 12 6 10 6 3 3 2 1 108 51 5 2 
ae 14 4 12 6 19 7 56 29 39 24 2 0 8 5 
ef 3 1 3 1 2 1 19 12 20 10 101 49 2 1 
bc 42 17 49 23 if 2 22 13 v: 5 21 13 2 2 
ce 4 3 41 24 6 3 45 26 27 7 19 8 8 4 
cd 5 0 31 12 57 33 tf 4 27 15 13, 5 10 6 
bf 26 14 4 3 2 0 9 4 5 2 100 49 4 3 
df 6 4 2 1 18 8 2 0 12 7 101 48 9 7 
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TABLE 20 
Test RESPONSES OF Low-Error SUBJECTS IN Experiment V, Conpirion F— 
Response (and associated training cues) 
‘Test figure ilab) Jac) (ad) ie) m(ed) alfe) No response 
Both | First | Both | First | Both | First | Both | First | Both | First | Both | First | Both | First 
af 4 3 12 7 9 8 2 1 1 0 33 19 7 1 
ae 5 3 5 2 6 3 30 13 12 6 3 3 7 4 
ef 0 0 3 3 1 1 20 9 10 6 31 18 3 2 
bc 10 6 20 7 4 2 10 7 1 1 19 9 4 2 
ce 2 1 17 7 2 1 21 10 8 5 15 9 3 1 
cd 0 0 17 10 13 5 5 4 16 7 14 6 3 2 
of 10 7 9 2 1 0 9 6 2 2 33 15 4 2 
df 0 0 1 1 13 5 3 2 13 9 31 14 7 3 
TABLE 21 
Tesr RESPONSES or Low-Error SUBJECTS IN EXPERIMENT V, CowprTION F+ 
Response (and associated training cues) 
Test figure i(ab) jac) k(ad) (eb) m(ed) n(fc) No response 
Both | First | Both | First | Both | First | Both | First | Both | First | Both | First | Both | First 
af 1 1 5 2 6 5 1 1 0 0 60 | 28 1 0 
ae 7 2 6 4 9 2 32 17 18 10 0 0 2 2 
ef 1 0 0 0 0 0 11 6 4 3 57 27 1 1 
be 23 9 22 11 4 2 11 6 1 1 13 8 0 0 
ce 0 0 18 10 4 3 27 14 12 3 10 5 3 2 
cd 2 0 9 6 31 17 2 1 20 10 T 1 3 2 
bf 10 4 0 0 0 0 4 3 1 0 59 30 0 0 
df 2 1 f 1 7 2 1 0 2 2 56 27 5 4 
TABLE 22 
Tssr Responses or HIGH-ERROR SUBJECTS IN EXPERIMENT V, Conpition F— 
Response (and associated training cues) 
Test figure i(ab) (ac) (ad) Leb) mled) n(fe) No response 
Both | First | Both | First | Both | First | Both | First | Both | First | Both | First | Both | First 
af 7 6 15 7 14 5 7 3 3 2 16 9 8 3 
ae 1 5 8 2 10 6 22 12 11 6 1 1 7 3 
ef 2 1 4 1 12 8 17 10 16 6 18 6 6 3 
bc li 5 15 10 7 1 8 3 6 3 14 6 9 7 
ce 4 2 8 5 8 4 15 6 16 9 14 T 5 2 
cd 2 1 9 4 18 10 il 5 14 6 12 7 4 2 
bf 16 9 8 3 10 4 8 5 2 1 21 10 5 3 
df 3 2 3 1 19 8 6 2 14 8 17 8 8 6 
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TABLE 23 
Tzsr RESPONSES or HIGH-ERROR SUBJECTS IN ExPERIMENT V, CoNprTION F+ 
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TABLE 24 


Summary or Test RESPONSES IN ExPERIMENTS III AND V IN PERCENTAGE FORM 


(ALL SunjEOCTS, Born RESPONSES) 


Response (and associated learning stimuli) 


No response 
Condition 


F+ 


F- 
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TABLE 25 
Tust Response PercenTaces IN Experiments III AND V PoorED OvER FIGURES OF 


Equat AMBIGUITY (ALL SUBJECTS, Bora RESPONSES) 


Response percentage 
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TABLE 26 
Comparison or EFFECTS or AMBIGUITY AND FREQUENCY OF Test Cums 


Constant ambiguity—Unequal training frequency 


Test figure Condition Frequency ratio == p an 

Less frequent A More frequent A o N 
cd and ce me 3:2 43.8 38.4 12.3 5.4 
bc F- 5:3 49.3 28.3 13.0 9.4 
cd F+ 5:3 56.0 29.3 8.0 6.7 
ce F+ 5:2 48.0 40.0 6.7 5.3 
All above figures combined 48.3 34.9 10.4 6.4 

Differential ambiguity—Equal training frequency 
EA Ambiguity Response percentage 

Test figure Condition grouping pu AS o N 

ef and df F- I-L 33.3 44.2 13.8 8.7 


Table 25 provides several comparisons 
in which ambiguity is constant while train- 
ing frequency varies, and in every case the 
frequencies of test responses in the A cate- 
gories exhibit a preference for the cue in 
the test figure which had the lower relative 
frequency during training. Thus, in the 
H-L tests, the A+ preference manifest for 
Experiment III is not evident at all in the 
F— condition of Experiment V, and the 
preference is greatly accentuated when the 
relative frequency of the H to the L cue 
(a to f) goes from 3:1 to 10:1. In the H-I 
tests, preference for A+ increases uniformly 
as the training ratio of H to I (a to e) goes 
from 3:2 in Experiment III to 6:2 in Ex- 
periment V, Condition F— and 10:2 in Ex- 
periment V, Condition F+. In the I-L 
tests, the moderate A+ preference shown in 
Experiment III disappears when the train- 
ing ratios are changed toward equality in 
the F— condition of Experiment V, but is 
greatly inereased when the ratios of I to L 
increase in the F+ condition. 

Table 26 contains the remaining compari- 
sons discussed earlier in the exposition of 
Experiment V. Included are the cases in 
which the test eues are of the same ambigu- 
ity but unequal in training frequency and 
in which the test cues are of differential 
ambiguity but equal in training frequency. 
With respect to effects of variation in 
training frequeney when ambiguities are 


constant, these analyses are in accord with 
the relationship apparent in Table 25, al- 
though in some instances (e.g., the F+ data 
in Table 26) the degree of preference for 
the less frequent cue is not directly related 
to the training ratio. The one comparison 
available with ambiguity varied while train- 
ing frequencies were held constant, the F— 
data for the test figures ef and df (Table 
26), yields a higher percentage of A— than 
A+ responses. While this difference may 
not be significant it certainly indicates no 
trace of a preference for the low ambiguity 
cue. 


EXPERIMENT VI 


The results of Experiment V serve to 
cast considerable doubt on the hypothesis 
that transfer to new stimulus compounds 
is mediated by a principle of cue selection 
on the basis of minimizing ambiguity. Sub- 
stantial deviations from component model 
predictions occurred when ambiguity was 
confounded with training frequency, but 
disappeared when the confounding was 
eliminated. However, the ordering of test 
response proportions in relation to training 
frequency was not entirely consistent, and 
only one comparison was available with 
cues having equal training frequencies but 
different degrees of ambiguity. The present 
experiment, utilizing the simplified design 
summarized in Table 27, will permit further 
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TABLE 27 
TRAINING AND TEsTING CONDITIONS 
or ExPERIMENT VI 

Training a 

Relative yl 
cont able, Responses frequencies combinations 

ab i 1 1 ad 

ac 1 Te cb 

dc k a bd 


evaluation of the effects of each of these 
variables singly, that is, with the other held 
constant. By reducing the number of train- 
ing syllable combinations, we contrived 
also to reduce the incidence of test response 
failures to negligible proportions. 


Merxop 


Subjects. The S pool was again the students in 
introductory psychology; 104 were used in the F= 
condition and 99 in the F— condition. 

Apparatus. Between Experiments V and VI, the 
laboratory was completely rebuilt concomitant 
with a shift in quarters. Six new booths were built 
with acoustic tiles on the sides and ceiling. Each 
booth contained a plywood panel which S faced 
as he sat in the booth. At the top of this panel 
was a 4-inch two-way loudspeaker for communi- 
cation between S and the experimenter (who was 
in an adjoining room). At the center of the panel 
was a plate upon which the display unit appropri- 
ate for a given experiment could be mounted. Just 
below this plate, resting on the surface of a desk, 
was a response-unit containing eight Plexiglas 
windows with a green light below each window 
and a push-button below each green light. The 
overall arrangement and operation of these re- 
sponse units were as described under Experiment 
II for the original units. 

The booths were in a single line so that the six 
Ss run at a given sitting entered from the same 
corridor. Behind the array of booths on the op- 
posite wall of the corridor, between Booths 3 and 
4, was a loudspeaker through which the instruc- 
tions came. The instructions to Ss were all con- 
tained on a two-channel magnetie tape which was 
run through a tape-deck and amplifier connected 
to the loudspeaker. 

As in Experiment III, stimuli were presented 
by digital display units, programmed by a Friden 
tape reader, and responses were recorded on the 
tape of a Friden punch. However, the operation 
was automated in that stimuli were selected and 
randomized by a generalized IBM 709 program 
(written in Fortran), the information in the out- 
put cards from the computer was put on paper 
tape by a Friden Flexowriter, and the tape for a 
squad of Ss run at a given time completely de- 
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termined the sequence of experimental events, 
Thus, in proper sequence the paper tape opera- 
tive in the reader turned the tape recorder on for 
instructions, allowed.a pause for questions, started 
the learning phase, sounded a-warning buzzer, pre- 
sented stimuli and responses, introduced a rest 
pause, started the testing phase, presented test 
stimuli when appropriate, introduced modifying 
instructions where necessary and even triggered 
the final statement that the experiment was com- 
pleted. Some safety devices were introduced to in- 
sure proper experimenter identification of all 
groups, counter resetting, and stimulus counter- 
balancing. 

Various auxiliary paper tape codes—stop, pause, 
tape recorder on, etc.—were entered as symbols in 
the basie symbol list. The various intervals were 
timed by ATC timers; at the end of a given 
timed period the input tape was searched again 
for further instructions. The end of a set of in- 
structions from the tape recorder was accom- 
panied by a signal which initiated further reading 
of the input paper tape. 

Procedure. The general procedure of Experi- 
ments IIT, IV, and V was followed. However, 
each training block contained only three different 
syllable pairs and there were only three different 
test figures. Eight acquisition blocks preceded the 
testing phase. The test combinations were ran- 
domly interspersed among additional training trials 
during the final two of the three blocks of the 
testing phase, three occurring in each of these 
blocks. 

The letters a, b, c, and d of the experimental 
paradigm (Table 27), were randomly replaced (for 
each squad of six Ss) by the nonsense syllables 
VOP, GAK, FUH, and Ter and the letters i, 7, and k 
were randomly replaced by the digits 1, 2, and 3. 
Once a set of these randomizations was made, the 
set was used for one squad under the F— condi- 
tion and one under the F— condition. 

Each training stimulus (ie., each syllable pair 
shown in Table 27 in each left-right arrangement) 
occurred once per block under Condition F— 
whereas differential frequencies of presentation 
were introduced in Condition F—. 'There were 
thus six trials per training block under F— and 
eight per block under F—. A set of randomized 
orders of presenting the six training pairs of Condi- 
tion F— was used for each squad of Ss, and the 
same order was used for the corresponding squad 
of Condition F— (correspondence being defined 
by the same stimulus and response assignment) 
with the random insertion of the additional stimu- 
lus-response occurrences within each block. 


RESULTS AND Discussion 


The basic test frequency data for all Ss 
are given in Table 28 and the percentages 
of occurrence of the principal response 
types arranged by combinations of cue 
ambiguity and training frequency, in Table 
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TABLE 28 
Txst RESPONSES or Aut SUBJECTS IN EXPERIMENT VI 


Response (and associated training cues) 


Test figure (ab) (ac) k(dc) No response 
Both First Both First Both First Both First 
Condition F— 
ad 23 11 53 32 130 59 2 2 
bd 89 44 20 12 94 44 5 4 
cb 125 62 57 30 20 12 6 0 
Condition F— 
ad 25 17 81 42 91 39 1 1 
bd 106 48 29 15 60 33 3 3 
cb 138 68 33 22 27 9 0 0 


29. As before, A+ denotes the response 
appropriate to the low ambiguity cue of an 
I-L (intermediate-low) test combination 
and A— the responses appropriate to the 
other cue; in this experiment there are, of 
course, no entries in the O (other appro- 
priate responses) category for I-L tests. In 
the case of the L-L tests, A+ represents the 
response appropriate to cue b, which has 
relative frequency of one per training block 
under both conditions, and A— the response 
appropriate to the other cue of the pair. For 
convenience, the ratio of training frequen- 
cies is given for each test pair, the test cue 
combinations under I-L being listed 
separately since their training ratios differ 
in the F— condition. 

A priori predictions of test response per- 
centages from the component model are 
simply 50% A+ and 50% À—, and since 
the incidence of response failures is very 
low, adjusted predictions would be essen- 
tially the same for I-L tests under both con- 
ditions. Some O responses occur on the L-L 
tests, indicating that despite the simplified 
conditions Ss are responding to irrelevant 
aspects of the test situation in a significant 
proportion of cases. However, responses de- 
termined by extraneous factors would be 
expected to fall in the A+ and A— cate- 
gories equally often. 

The principal conclusion to be drawn 
from the pattern of test response propor- 
tions exhibited in Table 29 is that degree of 
preference for the A+ category is uniformly 
related to training ratio but, when this 
variable is controlled, test responding is not 


in the direction of minimum cue ambiguity, 
or validity. For I-L tests following F= 
training, there is an apparent preference 
for the low ambiguity cue (i.e., for the A+ 
response category), but when the relative 
training frequencies are equated (test fig- 
ure ad following F— training) the prefer- 
ence disappears. Data from L-L tests con- 
form closely to component model predictions 
when training frequencies are equal (F=) 
but exhibit a substantial preference for the 
less frequent cue (i.e., for the A+ response 
category) when they are unequal. 

It is perhaps of some interest to note that 
the slightly higher percentage for A— rather 


TABLE 29 


Test RESPONSE PERCENTAGES OF ALL 
SUBJECTS IN Experiment VI 


Test type Response type” 
KM AE S ASO GON] 
Condition F — 
I-L 
ad 1:2 62.5 36.5 = 1.0 
cb 1:2 60.1 37.0 — 2.9 
L-L ist 42.7 45.2 9.6 2.4 
Condition F — 
ad 2:2 46.0 53.5 = 0.5 
cb 1:3 69.7 30.3 — 0.0 
L-L 1:2 53.5 30.3 14.6 1.5 


* For the L-L tests, A+ denotes the response 
associated with Cue b and A— the response asso- 
ciated with d; for the I-L tests, A+ and 
denote responses associated with the low and 
intermediate ambiguity cues, respectively. 


22 AnNoLD Binper AND W. K. Estes 


than A+ in test compound ad, Condition 
F—, is in accord with a similar finding for 
test figures ef and df of Experiment V, Con- 
dition F— (see Table 26). The difference 
between A-- and A— in both eases lies in 
the way the equal frequency per block was 
accomplished: for A+ the same learning 
compound was shown repetitively as a unit 
while A— was embedded in different learn- 
ing compounds. 


Discussion 


To recapitulate our principal findings: 
(a) Transfer to new compounds following 
training under a standard discrimination 
learning paradigm (involving two training 
patterns with overlapping components) was 
fairly well described by the component 
model. Some deviations from the test re- 
sponse probabilities predicted by the model, 
though not significant, suggested the possi- 
bility that Ss tended to sample preferen- 
tially cues with relatively high validities or 
low ambiguities—that is, cues which had 
been the relatively best predictors of rein- 
forcing events during training. (b) In a 
newly designed experiment with conditions 
arranged to facilitate any tendencies to 
respond in terms of cue ambiguities, Ss ex- 
hibited very marked preferences for low 
ambiguity cues in test compounds. (c) Con- 
trolled comparisons provided by further 
experiments showed that, in many cases at 
least, the apparent preference for low ambi- 
guity cues is actually a matter of preferring 
the test cue of highest “novelty.” 

One’s first reaction to these findings is 
perhaps to ask why the deviations from 
component model predictions have not been 
conspicuous in previous studies of stimulus 
compounding. The principal answer is quite 
likely that, as a general rule, relative fre- 
quencies of the various cues which appear 
in the test compounds have been equal dur- 
ing training. This was the case, for example, 
in the well-known study by Schoeffler 
(1954), which provided the first clean-cut 
experimental demonstration of successful 
prediction of transfer results via the com- 
ponent model, and in the study of transfer 
effects following discrimination learning by 
Estes, Burke, Atkinson, and Frankmann 
(1957). 


Next, one may well wonder why some of 
the auxiliary principles of transfer which 
have arisen independently in other con- 
texts do not seem to enter in any important 
way into the determination of test behavior 
in our studies. The most intuitively com- 
pelling of these auxiliary principles is, per- 
haps, that of cue validity, which was indeed 
the foundation of Restle’s theory of dis- 
crimination learning (1955). According to 
this concept, for which Restle (1955), in 
turn, acknowledges indebtedness to Law- 
rence (1950), S during the course of dis- 
crimination learning comes to ignore cues 
which are ambiguous, that is which are un- 
reliable predictors of reinforcing events, and 
to sample preferentially those cues which 
are less ambiguous, that is, which are relia- 
ble predictors of reinforcing events. An 
aspect of the concept which was not for- 
mally stated by Restle, but which has been 
assumed in many applications, especially in 
the area of concept learning (Bourne & 
Restle, 1959) is that following such training 
Ss should preferentially sample the cues of 
low ambiguity, or high validity, when they 
are encountered in transfer situations. 

Taking, on the one hand the results of 
our Experiment II, in which very strong 
tendencies for responding in terms of the 
least ambiguous cue were manifest on trans- 
fer tests and, on the other, the results of the 
remaining experiments which show little or 
no effect of cue ambiguity when other varia- 
bles are adequately controlled, we are in- 
clined to conclude that a generalized tend- 
ency to sample selectively cues of low 
ambiguity does not spontaneously arise in 
the course of discrimination learning. How- 
ever, it seems clear also that a tendency 
toward repeated selection of a particular 
cue may develop when such consistency is 
specifically reinforced during training, Evi- 
dently, conditions were near optimal in our 
Experiment II, in which the low-ambiguity 
cue was not part of a recurrent pattern dur- 
ing training but was continually re-paired 
with other, "transient," cues. It might be 
noted that the type of reinforcement con- 
tingency existing in this experiment is char- 
acteristic of experiments on concept iden- 
tification, in the context of which the 
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hypothesis of selective sampling on the basis 
of cue validity has proved fruitful. 

The notion of transfer responding in 
terms of response communality has been 
most clearly stated by Trabasso and Bower 
(1964) in connection with the interpretation 
of multiple category concept learning ex- 
periments. For example, in discussing the 
learning of a four-category problem, in 
which Ss learn to assign correctly to one of 
four categories stimulus patterns which are 
either circular or triangular in form and 
either orange or blue in color, Trabasso and 
Bower (1964) state the principle of response 
communality as follows: 


We now wish to know the probability with which 
the S will give any one of the four responses to a 
particular pattern shown on trial n (e.g., an orange 
circle). The performance rule is this: the subject 
generates a pair of covert responses for each rele- 
vant attribute and then overtly responds with the 
common element (intersection) from these two 
sets [p. 145]. 


Again, on the basis of our Experiment IV, 
we are inclined to conclude that this type 
of performance tendency does not sponta- 
neously develop in the course of ordinary 
discrimination training, though it may un- 
der the reinforcement contingencies of con- 
cept identification studies. 

The one auxiliary principle which we have 
found to operate ubiquitously, and evidently 
to involve no special conditions of differ- 
ential reinforcement for its appearance, is 
that of inverse frequency, or to put it more 
positively, relative novelty. Almost unfail- 
ingly throughout our entire series of experi- 
ments, Ss have been found to sample pref- 
erentially from test compounds the cues 
which had occurred less frequently during 
previous training. We failed to appreciate 
the power of this effect until relatively late 
in our series of studies because not only in 
the standard discrimination paradigm, but 
also in many of the modified designs used in 
our experiments, there is a negative correla- 
tion between cue validity and relative fre- 
quency. That is, cues of higher validity 
tend to occur less frequently than cues of 
lower validity during training simply as a 
consequence of the fact that the less valid, 
or more ambiguous, cues tend to be common 
to two or more training patterns whereas a 


cue of low ambiguity, or high validity, 
ordinarily belongs to only a single training 
pattern. However, although we were not, in 
fact, prepared for the importance of the 
relative novelty principle prior to our ex- 
periments, it appears in hindsight that we 
might have been, for this principle has 
forced its way to the attention of investiga- 
tors in other experimental contexts. In fact 
the principle of relative novelty is an im- 
portant component of Broadbent’s (1958) 
filter theory: 


...responses to a novel stimuli are in fact par- 
ticularly efficient: it is worth digressing at this 
stage to consider the evidence on this topic. 

A particularly good example was given in Chap- 
ter 2, when describing the experiments of Poulton 
(1956) on listening to messages from several loud- 
speakers. If the messages came more frequently 
from one loudspeaker, a message from the previ- 
ously quiet speaker was more likely to be cor- 
rectly heard than a message from the previously 
busy speaker. In terms of the filter theory we put 
forward earlier, the filter is biassed towards previ- 
ously quiet channels, and information on busy 
channels has a lower chance of reaching the per- 
ceptual system. In ordinary speech, we attend to 
an unusual event rather than a simultaneous usual 
event. This fact is particularly curious because a 
rare event contributes more information than a 
common event, on the measure considered in 
Chapter 3: and a task requiring the analysis of 
more information might be expected to be more 
difficult. It seems well supported by experiments 
in other fields, however. For example, Hyman 
(1953), working with visual reaction times, found 
that as the ensemble of signals increased the 
reaction time went up, confirming the finding by 
Hick (1952) that reaction time was proportional 
to the information in the signal When Hyman 
altered the frequency of the signals instead of 
having them equiprobable, he found once again 
that this decrease in the average information per 
signal did give a decrease in the average reaction 
time. But reactions to the least common signals 
were faster than they should have been on infor- 
mation calculations. Once again there seemed to 
be an undue bias in favour of the unusual event. 

Berlyne (1951a) also used visual reactions but 
required the subject to react only to one out of a 
group of simultaneous signals; and the most fre- 
quently chosen signal was recorded, After a se- 
quence of similar groups, if a group was presented 
in which all members but one were familiar, the 
unusual signal was the most likely to be chosen. 
The same author, using rats (1950), has introduced 
the animals to particular objects, then removed 
them, and faced them later with some of the 
former objects and a new one. The animals spent 
more time investigating the new object than the 
previously experienced ones. As he points out, the 


24 AnNoLD Binner AND W. K. Estes 


time scale is in this case different, but once again 
the unusual stimulus is more likely to elicit a re- 
sponse. 

If the response to a fresh stimulus is particularly 
efficient, it does not seem plausible that the in- 
efficiency of work, done just after a noise has been 
turned on or off, should be due to a general de- 
cline in ability to respond, due to some competing 
startle response. Such a competing response should 
interfere with responses to the novel stimulus it- 
self as well as with responses to other stimuli. It 
seems rather more likely that the man is unable to 
respond to visual stimuli from his task because he 
is taking in information from the ear; so he would 
actually be more effieient on responses to the 
auditory stimulation. Our view of the events within 
the man would be as follows. 

His capacity is limited, and therefore a filter 
placed early in his nervous system selects only part 
of the information reaching his sense-organs. This 
will normally represent information necessary for 
his task. The filter has a bias, however, towards 
channels on which any novel event occurs [pp. 84- 
86]. 


The influence of novelty may even be 
evident in the preference for A— over A+ 
when both are shown equally frequently 
per learning block (see Table 26, test 
figures ef and df, and Table 29, test figure 
ad under Condition F—). The equality of 
frequency was achieved by repetition of the 
full learning compound containing A+, 
which implies a slight edge toward A— in 
novelty. 

Perhaps some emphasis should be placed 
upon the distinction between the novelty 
principle and the matching rule. Matching 
was illustrated in the introduction to this 
report by findings of Binder and Feldman 
(1960). Thus, for example, if the pairing 
ac-A, occurred twice as often per block as 
the pairing ad-As , matching leads to the ex- 
pectation that on a test with the single cue 
a, responses A; and As should occur in a 
ratio of close to 2 to 1. While the novelty 
principle refers to cue selection when the 
stimulus is a compound, matching refers to 
response selection when alternative re- 
sponses are appropriate for a particular cue. 

The process of cue selection on the basis 
of novely has not been an issue in experi- 
ments aimed at establishing the conditions 
for matching since compounds have not 
typically been involved in the test config- 
urations. One exception lies in the research 
of Feldman (1963) which included within a 


larger array, test compounds requiring cue 
selection. However, even Feldman’s re- 
search is of little relevance for our principle 
of novelty since the phase preceding his 
test trials involved concept rather than dis- 
crimination learning. 

It appears that for the general class of 
transfer situations under consideration in 
this study the component model requires 
augmentation by a novelty principle, but 
we do not have immediately at hand any 
rational basis for a general quantitative 
formulation. A simple first approximation 
would be to assume simply that when cues 
from the training patterns of a discrimina- 
tion learning series are recombined in a 
transfer test, the sampling probability of 
each test cue is equal to the probability that 
it was not the last to appear on training 
trials preceding the test. Thus, if the relative 
frequencies of cues a and b during training 
were in the ratio x:y, the probability that 
cue a would be sampled from the test com- 
pound ab would be y/(x + y). 

To gain an idea as to how well this pro- 
visional form of an augmented component 
model accounts for effects in our data that 
appear to reflect the relative novelty prin- 
ciple, let us return to the data for the low- 
error Ss of Experiment I (the first instance, 
in our study, of a significant deviation from 
component model predictions). Cues a and 
€ occurred with frequencies in a 2:1 ratio 
during training. The estimate of .861 ob- 
tained previously for p, the probability that 
S samples any of the cues presented on a 
transfer test, is unaffected by the new as- 
sumption. Thus the probability of response 
i to test figure ae, is given by 


p (s + m) 4 — p)1/3 
= .86 (5/6) + .14/3 
= 104. 


since, when S responds to the test cues he 
samples e with probability 2/3, and in that 
event certainly makes response i; and he 
samples a with probability 1/3, and in that 
event makes response 7 with probability 
1/2. Proceeding similarly for the other re- 
sponses, and converting predicted response 
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probabilities to frequencies, we obtain 62, 
16, and 4 as the predicted frequencies of 
responses 7, j, and k to be compared to the 
observed values 67, 12, and 3, respectively. 
For test figure dc, the predicted frequencies 
prove to be 16, 50, and 16 for i, j, and k, 
corresponding to observed values of 19, 53, 
and 10, respectively. The remaining theo- 
retical entries in Table 5 are unaffected by 
the addition of the inverse frequency rule. 

In Experiment II, the training frequencies 
of cue d and the cue paired with it on a 
transfer test were in each case in the ratio 
2:1, whereas the frequencies, for low-error 
Ss, of test responses appropriate to cue d 
and those appropriate to the cue paired with 
it were in a ratio of approximately 6.3:1 
(Table 8). Clearly, as we had concluded 
on the basis of qualitative considerations 
earlier, the reinforcement contingencies of 
this experiment generate a sampling pref- 
erence for the unambiguous cue far too ex- 
treme to be accounted for by the inverse 
frequency rule. 

For the data of Experiment III, the 
augmented component model again proves 
quite adequate. Considering the data for 
the low-error Ss, in which substantial ex- 
cesses of A-- responses occurred for all 
types of transfer tests (see discussion 
above), and using the same p estimate as 
before, we obtain predicted A+ proportions 
of .585, .516, and .524 for H-L, H-I, and 
I-L tests, respectively, corresponding to ob- 
served values of .583, .617, and .522. 

The best quantitative test of the aug- 
mented model is provided by Experiment 
VI. The number of different training pat- 
terns was small enough so that the simple 
form of the inverse frequency rule might be 
expected to hold to a good approximation, 
but at the same time, owing to the training 
conditions, a considerable variety of train- 
ing-test relationships is available. 

In this instance, even the a priori predic- 
tions, utilizing no parameters estimated 
from the data, come off rather impressively. 


For the F= condition, the predicted pro- 
portion of A+ responses is .677 for both 
test figures ad and cb, and .500 for bd (in 
which case A+ is the response associated 
with b); the observed proportions (exclud- 
ing the O and N categories) are .631, .619, 
and .486 for the three figures respectively. 
For the F— condition, a priori predictions 
of .500, .750, and .667 for test figures ad, 
cb, and bd may be compared to the ob- 
served values .462, .697, and .639. 

These results may serve to indicate that 
the inverse frequency principle, even in a 
very provisional quantitative form, corrects 
the main disparities between predictions 
from the component model and our data 
for various test conditions. Although some- 
what better fits to our present data than 
those shown above could be obtained by 
systematically reestimating the various 
parameters, it does not seem worthwhile to 
push on in this direction. For one thing, the 
quantitative properties of the relative nov- 
elty principle need further specification. It 
seems almost certain that the weighting 
factor associated with the cue which repre- 
sents its relative novelty must change as 
some systematic function of time since last 
occurrence. By analogy with related proc- 
esses that have been dealt with in stimulus 
sampling theory, it seems likely that an ex- 
ponential decay function will be called for 
(see Atkinson & Estes, 1963, pp. 219-223). 
It will probably prove expedient to under- 
take development of the more elaborate 
model in connection with suitably designed 
new experiments in which time intervals 
between occurrences of a given cue on train- 
ing and testing trials can be controlled ac- 
cording to simpler schemes than those ob- 
taining in the present study, From our 
present findings we conclude only that a 
relative novelty principle has considerable 
support at a qualitative level and is evi- 
dently a variable requiring careful atten- 
tion in the design and interpretation of all 
types of transfer studies. 
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Evoked cortical responses were obtained in a number of studies dealing 
with various aspects of visual perception. On the basis of the variations 
noted in the complex response pattern under the different conditions it has 
been possible to identify certain components of that pattern as being re- 
lated to specific aspects of the stimulus situation, such as intensity, color, 
and background level. In addition, the overall evoked response pattern 
appears to be directly related to phenomena encountered in the study of 


the perception of flickering stimuli. 


P= development of average response 
computers has been of great value to 
those involved in the study of human 
sensory mechanisms and perception, hav- 
ing made it possible to conduct both psy- 
chophysical or perceptual studies and 
neurophysiological studies with the same 
“intact” subjects (Ss). This marks an im- 
portant step toward the realization of 
Fechner’s proposed “inner” psychophysics. 
The information to be gained by the use of 
this technique is of course of an entirely 
different nature than that being obtained 
by microelectrode techniques. Hopefully, it 
might relate more closely to integrative 
processes of the central nervous system and 
thus help bridge the gap between single-cell 
responses and subjective experience. 

For the past several years our group 
has been studying human evoked cortical 
responses, primarily using visual stimuli. 
In order to better understand the nature 
of these responses, and to establish the 
range of conditions over which they might 
be useful as “objective” correlates of per- 
ceptual phenomena, a wide variety of situ- 
ations have been investigated. On the 
basis of the results of these studies it has 
been possible to identify certain compo- 
nents of the evoked response pattern as 
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being related to various aspects of the 
stimulus situation. In the present paper, 
data from a number of these studies are 
presented to illustrate the nature of the 
differences which have been found, and a 
tentative classification of the various com- 
ponents of the visually evoked response pat- 
tern is presented. 


ErrECTS OF VARIATIONS IN 
STIMULUS INTENSITY AND 
BACKGROUND LEVEL 


This study was performed in order to 
provide a general idea of how the complex 
evoked response pattern changes under 
varying conditions of stimulus intensity and 
adaptation level. A “ganzfeld” stimulus was 
utilized, achieved by having half a table 
tennis ball attached over the eye stimu- 
lated, so that complexities which might be 
introduced by the presence of any marked 
contours in the field of vision would be 
avoided. In all the studies to be described in 
this report the stimulation was monocular 
(right eye), and the recording monopolar 
(between left occipital region and the left 
earlobe) unless stated otherwise. 

The stimulus consisted of a single flash 
from a Grass photo-stimulator (P3) set at 
its highest output level (I-16). The light 
from the photo-stimulator entered S's 
shielded cubicle through a filter holder 
which was mounted flush with a 2¥-in. 
square hole cut in an opaque plastic 
“window” which was at S's eye level. Im- 
mediately below this stimulus aperture 
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there was a second similar arrangement 
through which the light from a small spot- 
light could be directed at the face of S, 
thus providing a variable background 
level on the “ganzfeld.” The S was so 
positioned that his face was approximately 
30 in. from the stimulus aperture. 

In order to provide a suitable range of 
conditions the background level was set 
such that the stimulus was just detectable 
to S when the stimulus flash was reduced 
in intensity by means of a 2.0 log neutral 
density (ND) filter. Four intensity levels 
were used, separated by Y log steps, and 
four background levels, separated by 1 log 
step. The stimulus flashes were presented 
at the rate of 1/sec. The responses to 200 
such stimuli were summed by the computer 
of average transients (CAT) for each of 
the 16 conditions. This constituted a 
day’s run for the S. Four such runs were 
conducted on each S on consecutive days 
so that an estimate of the degree of day-to- 
day variability in the response could be 
obtained. Two Ss were utilized for the 
complete series as described. A number of 
other Ss were tested under selected condi- 


tions so that an estimate of individual 
differences could be obtained. 

The complete results for one S are shown 
in Figure 1. The four daily runs have been 
superimposed to show the degree of rep- 
licability obtained. The day-to-day vari- 
ability shown in these records is in agree- 
ment with the reports of others who have 
studied this problem (e.g., Dustman & 
Beck, 1963). Individual differences in the 
overall waveform of the response were 
quite striking. The various components of 
the complex response pattern were present 
in all the individuals studied, but with 
differing relative amplitudes, thus creating 
markedly different waveforms for different, 
Ss. This is also in line with the findings of 
other workers. The details of the response 
pattern can be seen more clearly in Figure 
2, a single day's run for this same S. 

Some general features are immediately 
obvious. The increase in the overall mag- 
nitude of the response as we go from the 
near-threshold condition at the lower right 
to the high-contrast condition at the upper 
left seems to agree quite well with the 
corresponding changes in perceived flash 
brightness. The same is true in general for 
the differences in overall amplitude within 
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the columns and within the rows. The 
very marked increase in amplitude that 
occurs in the immediate vicinity of the 
near-threshold condition should also be 
noted. We shall discuss this aspect in more 
detail in a later section. 

In the preceding paragraph the term 
“overall” magnitude was used for a defi- 
nite reason. A careful study of response 
patterns in Figure 2 shows that the various 
components that go to make up those pat- 
terns tend to behave quite differently as 
the experimental conditions are varied. To 
illustrate this point observe what happens 
to the three positive peaks occurring be- 
tween 100 and 200 msec. after stimulation. 
These can be seen most clearly in the 
upper left quadrant of the figure. We shall 
refer to them as Cı, C», and Cs, in order 
of time of occurrence. 

In the first column (3.0 log ND back- 
ground), for example, observe what hap- 
pens to the three peaks in question as the 
flash intensity is reduced. The change in 
Cs (200 msec.) is most striking, it being 
almost eliminated completely when the 
stimulus intensity is decreased by one log 
unit. C, (100 msec.) is similarly affected 
by a decrease in flash intensity, but not to 
the same degree, there still being a well- 
defined response at the lowest stimulus in- 
tensity used. The amplitude of C2, on the 
other hand, is relatively unaffected by the 
decrease in stimulus intensity. The main 
effect on Cz appears to be a marked in- 
erease in peak latency as the stimulus is 
decreased. It should be noted at this time 
that the peak latencies of Cı and C; do not 
change appreciably at a given background 
level, regardless of change in stimulus in- 
tensity. 

Now observe what happens to our three 
positive peaks as we change background 
level, leaving stimulus intensity constant. 
The relative flash intensity of 1.5 (third 
row) illustrates these changes quite clearly. 
As the background level is increased we 
see that C, tends to increase up to a point 
and then to decrease as the background 
level is increased still more. The fate of 
Cz and Cs are even more interesting. Co, 
the most outstanding feature of the re- 


sponse at the lowest background level, 
rapidly decreases in amplitude as the back- 
ground is increased, being gone by the 
time the background has been increased by 
two log units. Cs , on the other hand, starts 
as a mere pip on the waveform pattern at 
the lowest background level but increases 
in amplitude very markedly as the back- 
ground level is inereased. Along with this 
inerease in amplitude there is a significant 
decrease in the peak latency, amounting 
to from 30 to 40 msec. over the range of 
background intensities used. 

We noted earlier that the amplitudes of 
Peaks Cı and Cs were directly related to 
stimulus intensity under the low-back- 
ground situation. For the highest stimulus 
intensity used, therefore, these two are 
fairly large even at the lowest background 
level, and an increase in the background 
does not appear to bring about any great 
changes in their amplitudes (top row of 
figure). The decrease in the peak latency 
of Cs with an increase in the background 
level does still occur. No such latency 
shift is noted for C,, however. An ex- 
amination of all the conditions shows this 
to be the case. The peak latency for C; 
did not change, even under conditions 
whieh produced marked changes in the 
peak latencies of Cə and C. For the 
highest stimulus intensity condition, Cs 
behaves as it did under the other condi- 
tions, decreasing in amplitude as the back- 
ground level is raised, being entirely absent 
at the highest background level. 

On the basis of the above observations 
some tentative hypotheses can be put forth 
regarding the nature of the three compo- 
nents in question. In the first place, it 
appears that Cə is in some way related to 
scotopic visual activity. The fact that this 
component tends to be present only under 
the lower background conditions, and in- 
deed is the outstanding positive component 
in the lower-left quadrant of the figure 
(representing both low-background and 
low-stimulus intensity) attests to this con- 
clusion. On the other hand, both C4 and 
Cs appear to be related to photopic visual 
activity. The differences noted between the 
behavior of Ci and Cs, however, lead to 
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the further conclusion that each is related 
o a different type of photopic activity. 

In order to justify the conclusion that 
C, and C3 probably relate to different 
photopic processes their differences should 
be reexamined. (a) In terms of peak la- 
tency, C, remained essentially constant 
over the entire range of stimulus intensities 
and background levels studied. The peak 
atency for Cs decreased significantly 
under the higher background conditions. 
(b) In terms of amplitude, C, tended to 
vary direetly with the stimulus intensity 
for any given background level. With Cs 
his correspondence held only at the lower 
background levels. At the higher back- 
ground levels some other factor seemed to 
be effective. Noting again the striking 
growth in amplitude of Cs as a function of 
baekground intensity, especially for the 
lower stimulus flash intensities (and ex- 
cluding the near-threshold condition), it 
seems very likely that this other factor 
might be related to the phenomenon of 
photosensitization or photie potentiation 
(Chang, 1959). The suggestion being made 
is that our Ca seems to be related to a type 
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of photopic activity that is greatly in- 
fluenced by such a process. 


Cotor EFFECTS 


The tentative identification of a scotopic 
component and two differing photopic com- 
ponents in the complex response pattern 
has led to the hope that color-specific re- 
sponse patterns might be identifiable. Fig- 
gure 3 presents two examples of the kinds 
of results that give more reason for such a 
hope. Figure 3B is given here to illustrate 
the effect of a change in background level 
on the response pattern obtained from a 
different S (ML) than the one whose rec- 
ords have been discussed. In this particular 
situation the stimulus consisted of a com- 
bination of red and green (produced by 
Wratten filters 29 and 74). The result ob- 
tained under two conditions are shown su- 
perimposed. In one S's room was darkened 
while in the other a medium-intensity 
background was projected onto the halved 
table tennis ball covering his eye. We 
would predict from the results of our first 
S (RH) that an increase in the background 
level should bring about a reduction in 
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Fig. 3. A. Sample records illustrating the difference in S RH's responses evoked by red and blue 
stimulation. B. Sample records illustrating the effect produced by varying the background level. S 
ML. Note that the component occurring at around 150 msec. C» is most strongly affected. 
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amplitude in the region of time occupied 
by Cə (about 130—160 msec,), whereas Cı 
and Cs might be accentuated. Such appears 
to be the case. At least the component 
relating to scotopie activity is common to 
both Ss. 

In Figure 3A we return to the first S 
(RH) to show his differing responses to 
red and blue light (Wratten numbers 29 
and 49b). The highest flash intensity avail- 
able was used (I-16 on the Grass photo- 
stimulator, with no neutral density filters) 
and the background was one log unit lower 
than the lowest used for the records shown 
in Figures 1 and 2. The difference in this 
S's responses to red and blue is quite strik- 
ing and obvious. Component C; appears to 
be entirely absent in the response to blue 
light. Instead, we have another positive 
component (C4) that peaks about 35 msec. 
after Cs should have appeared. The occur- 
rence of a sizable C4 under these conditions 
is further evidence that it relates to a 
different activity than does Cs . 

Figure 4 presents another example of 
how the different components of the evoked 
response pattern may vary às à function 
of the stimulus color and how such differ- 
ences can be enhanced by variations in the 
background level. This was also a ganzfeld 
situation. In the “ALL” condition the 
colored stimuli (Wratten numbers 29, 74, 
and 49b) were combined by allowing all 
three to strike the ganzfeld simultaneously. 
It should also be noted that this S's re- 
sponse to blue is quite different than that 
of S RH (shown in Figure 3A). Component 
Cs is definitely minimized in this case also, 
but C4 is not such an outstanding feature. 
With this S, Component C, is more in 
evidence when the background illumina- 
ton level was raised. The response pat- 
terns obtained for the “ALL” conditions 
were very similar to those obtained in 
other studies when this S was stimulated 
by white light. 

The records shown in Figure 4 lend 
strong support to the conclusions arrived 
at earlier concerning the nature of Ci, C2, 
and Cs. Ce again seems to be related to 
scotopic activity, being sharply reduced in 
amplitude when the background illumina- 
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Fic. 4. Examples of S ML’s responses to differ- 
ent stimulus colors. Note that the differences are 
enhanced by raising the background level. 


tion was raised. C, and Cs again appear to 
be related to photopic activity, with [071 
being most responsive to green and Cs 
being most responsive to red. These re- 
sults indicate that this approach should be 
very fruitful for the study of responses to 
color and are quite in line with the results 
reported by Shipley, Jones, and Fry 
(1965), when allowances are made for dif- 
ferences in experimental techniques. 


Evoxep RESPONSE PATTERN AND 
PERCEIVED NUMBER 


It has long been known that when one 
observes a flickering light source the per- 
ceived rate of flicker is not necessarily 
equal to the actual rate. As the result of an 
extensive series of studies on this topic, 
wherein trains of visual stimuli consisting 
of various numbers of flashes at various 
subfusional rates were presented to Ss who 
reported the number of flashes perceived 
for each such train of flashes, the conclu- 
sion was reached that: 


the number of flashes reported by the subjects 
depended primarily on the time it took to present 
a stimulus sequence and not on the number of 
stimuli in that sequence. Perhaps the most im- 
pressive aspect of the data was the extreme relia- 
bility of the responses to certain number-rate 
sequences. For example, (for a certain level of 
background illumination) all the subjects always 
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Fic, 5. An example of S RH’s response to high 
intensity white light stimulation upon a low level 
background. Summation of 200 responses, with 
ganzfeld in use. Negative downward. Graph at 
bottom of figure represents the results of an earlier 
study in visual perception. N, represents the num- 
ber of “flashes” most often reported by Ss when 
they were presented trains of various numbers of 
flashes (No) at the rate of 30 fps. 


reported having seen 2 flashes whenever 5 flashes 
at 30 per second were presented to them, and they 
always reported having seen 3 flashes whenever 10 
flashes at 30 per second were presented.... The 
results of this study convinced the authors that 
the limiting of the perceived number of flashes was 
due to some basic physiological process [White, 
1963, p. 8]. 


The retina was at first suspected as the 
locus of this limiting process, but this was 
ruled. out by studying the electroretino- 
gram (ERG) obtained under high rates of 
photie stimulation. The retina was re- 
sponding to every flash presented to it, so 
it was assumed that some more central 
process was limiting the number of flashes 
perceived. Further investigation led to the 
development of the “visual numerosity 
function," which showed the maximum 
number of flashes S could perceive as a 
function of the duration of stimulation. 
The visual numerosity function consists 
of two distinct segments: (a) from the 
onset of stimulation up to about 250-300 
msec.; and (b) from about 250-300 msec. 
on. A more detailed description is as fol- 
lows: 


First, there is an initial fusion period, during 
which time the subject reports that he sees only 
a single flash; following this there is a short period 
when the function rises rapidly, the slopes indicat- 
ing a rate of increase of perceived flashes of about 
12-13/second. This rate is not maintained, how- 
ever, but instead the function tends to level off 
about 200 milliseconds after the onset of stimula- 
tion. At about 250-300 milliseconds after onset, 
the second major segment of the numerosity func- 
tion begins. As a result of all the studies done on 
this topic to date, there is good agreement to the 
fact that the slope of this portion of the function 
beyond 300 milliseconds indieates a rate of in- 
crease of perceived flashes of approximately 6-7 
per second [White, 1963, p. 20]. 


The initial fusion period was found to 
vary with the background level, so it was 
classified as being related to a peripheral 
process (dark adaptation). 

The general description of the numer- 
osity function just given sounds very much 
like a description of the evoked response 
pattern itself. This pattern consists of two 
main parts, a complex transient response 
ending about 250 msec. after stimulus on- 
set and a rhythmic afterdischarge which 
appears to start as the transient ends. 
Figure 5 shows such a pattern in rather 
pure form. This particular sample was ob- 
tained by summing the responses to 200 
single flashes of white light. The time of 
the flash is indicated by the arrow. The 
response pattern has been shifted about 50 
msec. to the left in relation to the time 
markers on the abscissa in order to adjust 
somewhat for the latency. The time values 
are meant to represent the time after the 
first neural activity evoked by the onset 
of stimulation has reached the cortex. This 
is a very rough approximation, but it will 
serve our present purpose. In the lower 
part of the figure, data from one of the 
series of studies on “temporal numerosity” 
are presented in graphical form. These par- 
ticular results were obtained under con- 
ditions which have yielded the greatest 
perceived number of flashes as a function 
of flash-train duration (White & Cheat- 
ham, 1959). The values plotted are the 
modes. The ordinate (N,) represents the 
number of flashes reported, while “N,” is 
the number of flashes presented in a given 
flash train. N, is plotted along the time 
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Fic. 6. Responses of S RH to trains of five red 
or blue flashes of light presented at 25 fps. N, rep- 
resents the number of "flashes" perceived by 8 
under these two conditions. Run 1 and Run 2 were 
performed on consecutive days. Arrows indicating 
stimuli have been offset 50 msec. to allow 
(roughly) for lateney. Ganzfeld condition used. 
Highest stimulus intensity: available was used, pre- 
sented upon a high-level background of white 
light. N = 200 for each record. Negative down- 
ward. 


base to show how long it took to present a 
given train of flash stimuli at the repetition 
rate used in that study (30 fps.). For N,'s 
of one and two, the most frequent response 
was “one”; for N,’s of three and four, the 
most frequent response was “two”; for N,’s 
of five and six, it was “‘three;” for N,'s of 
seven and eight, it was “four;” while for 
an N, of nine the responses were equally 
distributed between “four” and “five.” 
When 10 flashes were presented at this rate 
the most frequent response made by Ss 
was “five,” 3 

Tf these responses are considered in re- 
lation to the evoked response pattern in 
the upper portion of the figure a rather 
remarkable thing is seen—the appearance 
in time of each successive perceptual unit 


®The difference in the number of flashes per- 
ceived in this case, as compared to the data 
quoted earlier, is explained by the fact that a 
much higher background level was used here. This 
minimized the duration of the initial fusion pe- 
riod. It should also be noted that the values given 
for the slope of the temporal numerosity function 
were based on the mean number of flashes per- 
ceived, and thus may be deceptively low. 


seems to coincide with the occurrence of 
the successive components of the evoked 
response pattern. Remembering that the 
evoked response pattern shown is produced 
by single flashes, this must mean that the 
onset of stimulation in some way initiates 
a process (or processes) which can have a 
marked influence on the perceptual re- 
sponse to any succeeding stimulation. The 
further implication is that the evoked re- 
sponse pattern should not be greatly 
changed by the presence of more than one 
flash in a sequence. Figures 6 and 7 show 
that such is indeed the case. 

In Figure 6 we have examples of one 
of our Ss’ response patterns evoked by 
trains of five flashes separated by 40-msec. 
flashes (25 fps.). Since this S had previously 
exhibited a markedly different evoked pat- 
tern in response to red and blue light (Fig- 
ure 3A), this was also made a variable in 
this case. The two sets of response patterns, 
Run 1 and Run 2, represent replications of 
the experiment obtained on successive 
days. Each record represents the summa- 
tion of 200 responses. The position of the 
five arrows representing the stimulus 
flashes have again been displaced in time 
in order to try to account somewhat for 
latency. 


No 25 FLASHES 
50 CPS 


SECONDS 


Fic. 7. Response pattern evoked by trains of 
25 flashes of white light presented at rate of 50 fps. 
Ganzfeld condition, with relatively high-intensity 
stimulus (I-8 level of photo-stimulator) on high- 
level background. Summation of 200 responses. 
Note the marked off-effect. Negative downward. 
The S reported perceiving five or six ^flashes" each 
time the stimulus train was presented. 
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As indicated in Figure 6, our S consist- 
ently reported having seen three flashes 
when the five red flashes were presented, 
and two flashes when the five blue flashes 
were presented. This difference seems to 
be related to the fact that one of the major 
components (Cs) is absent in the response 
pattern evoked by the blue stimuli. The 
results of this particular experiment tend 
to verify three things. First: the number of 
perceived flashes S reports in response to 
& train of flash stimuli is limited by the 
temporal characteristics of the cortical re- 
sponse pattern evoked by those stimuli. 
Second: the characteristics of the evoked 
cortical response pattern appears to be 
determined by the nature of the stimulus 
conditions at the time of onset. Third: 
this particular S exhibits reliably different 
evoked patterns for red and blue stimuli. 

One last example of the relationship be- 
tween the number of perceived flashes and 
the evoked cortical response pattern is 
shown in Figure 7. In this case, white light 
was used, again with the ganzfeld, The 
flash intensity was fairly high (setting 
number 8 on the photo-stimulator), upon a 
medium background. The evoked responses 
to 200 flash trains were summed. The stimu- 
lus trains consisted of 25 flashes, each flash 
separated by 20 msec. (50 fps.). Since in the 
various studies on “temporal numerosity” 
(the term which has been used to describe 
the perceived number phenomenon being 
discussed) the highest stimulus rate ever 
used was 30 fps., there were no perceptual 
data to compare with the evoked pattern. In 
this case it was decided to try to predict 
what the perceptual response would be on 
the basis of the evoked pattern. In Figure 7 
it can be seen that there were six wave com- 
ponents during the period of time the in- 
termittent light stimulus was being pre- 
sented. Because of this it was predicted 
that the greatest number of flashes he 
would report having seen would be six. 
Other considerations, such as the possibil- 
ity of an extended period of fusion im- 
mediately after onset of stimulation and 
the knowledge that the rate at which addi- 
tional perceptual units are added to a 
perceived sequence decreases sharply after 


a duration of 300 msec., led to the conclu- 
sion that he would also report having seen 
only five flashes at least part of the time. 
All this time S remained in his shielded 
room unaware of the predictions which 
were made. Upon being asked to report 
the number of flashes perceived after each 
flash train was presented he began by re- 
porting “six.” With continued repetitions 
he began to report seeing only “five” part 
of the time. At no time did he report having 
perceived anything other than “five” or 
“six.” Later, a number of other of our 
personnel were presented with this same 
stimulus train. All agreed that there ap- 
peared to be five or six flashes in the 
sequence, 

As the result of the extensive studies on 
“temporal numerosity" which were per- 
formed it was concluded that the onset of 
stimulation triggers some central process 
(or processes) which interacts with afferent 
neural activity in such a way to limit the 
rate at which perceptual units can be 
added. The studies on evoked response pat- 
terns have shown that the onset of stimula- 
tion does indeed trigger some central cyclic 
processes, whose temporal characteristics 
are very much like those of the hypotheti- 
cal processes. Both the perceptual data and 
the evoked response patterns show an im- 
portant change in character at a point 
about 250 msec. after onset. This marks 
the end of the initial complex pattern and 
the beginning of the rhythmic aftereffect. 
In the perceptual data this marks the point 
where the rate at which perceived flashes 
are added to a sequence changes from 
about 12/sec. to about 6/sec. During this 
first segment there seems to be a one-to-one 
relationship between the components of the 
response pattern and number of flashes 
which can be preceived. During the second 
segment there seems to be a two-to-one 
relationship between the cyclic brain proc- 
esses and the perceived events. (The basic 
frequency of the rhythmic aftereffect ap- 
pears to be equal to S's alpha rhythm.) 
Thus there definitely appears to be a close 
relationship between the temporal numer- 
osity phenomena and the evoked cortical 
response pattern. The exact nature of this 
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correspondence and its functional signifi- 
cance, if any, are not clear. It is tempting, 
however, to view this relationship in terms 
of the cy berneticists' “scanning” mechanism 
(Wiener, 1948) and the concept of the “psy- 
chologieal moment" (Stroud, 1955; White, 
1963). It is also of interest to note that the 
duration of the initial phase of the evoked 
response pattern (about 250-300 msec.) 
corresponds to a duration that has been 
found to be critical in studies dealing with 
complex visual diserimination, especially 
those associated with contour processes and 
identifications (Kolers, 1964; Schlosberg, 
1965). This appears to give more substance 
to the concept, inherent in much of the pre- 
vious discussion, that the components of the 
evoked response pattern are related in some 
way to the various aspects of the informa- 
tional processing of visual stimuli. 


Evoxep RESPONSE AND PERCEIVED 
BRIGHTNESS 


In this discussion of the evoked re- 
sponse patterns shown in Figure 1 it was 
pointed out that the amplitude of some of 
the components seemed to vary directly 
with the intensity of the light stimulus. 
This is especially true of the component 
referred to as C, (peaking at about 100 
msec, after onset of stimulation). It can 
be seen that for any given background level 
this is the case. Component Cs (peaking 
at about 200 msec. after onset) seems to 
be useful in this regard only when the 
background level is relatively low. 

A study was performed in order to de- 
termine how well the amplitude of these 
components would correlate with perceived 
brightness in a very marginal situation. 
Stimuli consisting of pairs of identical 
flashes separated by 9, 16, or 25 msec. were 
presented to a group of Ss. All of these 
flash pairs were perceived as single flashes 


‘by Ss. In order to establish the relative 


brightness of these three stimuli a temporal 
forced-choice procedure was carried out, 
the results of which showed that the per- 
ceived brightness decreased as the inter- 
flash interval increased. The second phase 
of the study consisted of obtaining evoked 
response patterns for the two smaller inter- 
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Fic. 8. Evoked potentials obtained in response 
to fused flash pairs having interflash intervals of 9 
and 16 msec. Onset at start of trace, each trace 
representing the summation of 100 flash pairs in one 
channel of the computer. All four records obtained 
during a single session, in counterbalanced order. 
Negative downward. 


flash conditions (9 and 16 msec.). It was 
found that the amplitudes of the critical 
components (Cı and Cs) were significantly 
different for the two conditions. Figure 8 
illustrates the type of results obtained. 
(Each condition was replicated twice in 
this study in order to check on the reliabil- 
ity of the results.) 

It is interesting to compare the response 
patterns in Figure 8 with those in Figure 2, 
which were obtained earlier from this 
same S. The differences noted between the 
responses to the two conditions in Figure 
8 are seen to be very much like those shown 
in the first column of Figure 2 for relative 
flash intensities of 1.0 and 1.5. 

This particular study has been reported 
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Fia. 9. Evoked response patterns recorded si- 
multaneously from scalp over both occipital lobes. 
High intensity pinhole source, fixated foveally, 
with medium background level. Summation of 200 
responses. 


in greater detail elsewhere (Bartlett & 
White, 1965). 


Evokgp RESPONSES TO MINIMAL 
VISUAL STIMULI 


The studies described so far have all 
been concerned with the responses elicited 
by full-field stimulation. If there are basic 
differences between the responses elicited 
by stimulation of foveal and peripheral 
regions of the retina, as is indeed the case,* 
they would not be revealed by such pro- 
cedures. For this reason, among others, it 
was decided to determine the nature of 
the response to minimal stimulation, both 
in terms of the physical size of the stimulus 
and its intensity. 

One important result of our work along 
this line was the general finding that if a 
given stimulus situation was perceived by 
S an evoked response could also be ob- 
tained. This was noted earlier in regard to 


* An extensive study dealing with variations in 
the evoked response as a function of the location 
of the stimulus in the visual field has been per- 
formed by Eason and White (in preparation). In 
addition to such position effects, marked lobe 
dominance effects were indicated. 


Figure 1 and Figure 2 for the base condi- 
tion (near-threshold) in the lower right 
corner. The present case was a more dra- 
matic illustration of this principle, how- 
ever. In trying to discover how small a 
visual stimulus we could use and still ob- 
tain an evoked response it was found that 
there seemed to be no effective lower 
limit—if the stimulus were visible a re- 
sponse would be elicited. So, for our mini- 
mal visual condition we used what was 
quite literally a pinhole source. The face of 
the photo-stimulator was masked by black 
electrician’s tape, in the center of which a 
very minute hole was produced by the 
point of a needle. A piece of tracing paper 
was placed flush with the masking tape, 
both to provide a diffuse light source and 
a white surround. The location of the 
stimulus hole was indicated by a black 
circle about 2 mm. in diameter. The back- 
ground light was raised to a level such 
that the fixation circle could be seen 
clearly. 

A sample response pattern obtained 
under the conditions just described is pre- 
sented in Figure 9. Here the highest in- 
tensity produced by the stimulator was 
being utilized and was perceived by S as 
an intense spot of light. It can be seen that 
the form of the response is very similar to 
those shown in Figure 2 in the right-hand 
column, where the full-field stimulus was 
rather weak and the background level was 
high. 

Since we could obtain an evoked re- 
sponse to this very minute stimulus it was 
decided to see how the nature of the re- 
sponse would vary as a function of various 
stimulus parameters. Of particular inter- 
est was the effect of stimulus duration. In 
one study, trains of flashes (with an inter- 
flash interval of 10 msec.) were used to 
provide varying durations of stimulation. 
The lowest intensity level produced by the 
photo-stimulator (I-1) was used, and the 
intensity of the background was adjusted 
so that a single flash was approximately 
at S's threshold. As additional flashes were 
added to the trains the perceived bright- 
ness and the apparent size of the stimulus 
light increased markedly. The evoked re- 
sponse paralleled this increase, being barely 
detectable for the one-flash condition and 
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increasing to a maximum amplitude for 
the four-flash condition, representing a 
total duration of stimulation of 30 msec. 
Jt was assumed that this was the critical 
duration for the conditions of the study. 
A comparison study was carried out using 
a glow modulator tube as the light source. 
Here the duration of a continuous light 
source could be varied. The results were 
similar to those obtained with the fused 
flash trains. A detailed report on these two 
studies is being prepared. 


Discussion 


It has been shown that the evoked corti- 
cal response, as obtained with a device 
such as the computer of average transients, 
can be of great value in the study of a 
wide variety of visual problems. Stimuli 
ranging from minute point-sources up to 
the ganzfeld can all be utilized, as can 
situations involving thresholds, brightness 
perception, and color vision, Certain stud- 
ies dealing with the temporal character- 
istics of the visual process have also been 
successfully carried out, both by ourselves 
(as described above) and by other workers 
(Donehin, Wicke, & Lindsley, 1963). Pre- 
liminary studies comparing the responses 
evoked by ganzfeld stimulation with those 
evoked by structured fields have suggested 
the possibility that certain aspects of 
form perception could also be approached 
by this technique. 

The types of studies mentioned in the 
preceding paragraph all deal with the phys- 
ical aspects of the stimulus situation. A 
number of previously published reports by 
various workers have demonstrated the 
marked effect of subjective factors on the 
evoked response patterns. These have been 
referred to by such terms as “attention,” 
“vigilance,” “level of activation,” and 
“meaningfulness of the stimuli” (e.g., 
Chapman & Bragdon, 1964; Eason, Aiken, 
White, & Lichtenstein, 1964; Spong, 
Haider, & Lindsley, 1965). All such factors 
probably contribute to the intrasubject 
variability of response which is found by 
all workers in this field. The well-estab- 
lished habituation of response to repetitive 
stimulation, undoubtedly related to the 
subjective factors listed, is also of im- 
portance in this regard. 


The changes in evoked response patterns 
related to the subjective factors listed are 
of great interest and value, but when one 
is trying to relate those patterns to psy- 
chophysical phenomena such variability is 
most troublesome. Experience has shown 
that one must expect such variability in 
any study being contemplated and do 
everything possible in the way of experi- 
mental design to minimize the effect. Short 
runs of any one condition, frequent op- 
portunities for S to leave the experimental 
area, and counterbalancing of the various 
conditions over time are all essential. The 
fact that there is marked intersubject and 
intrasubject variability must be considered 
to be a blessing instead of a curse, since it 
attests to the sensitivity of the technique. 
Such sensitivity places greater demands on 
the ingenuity of the workers using the 
technique if they hope to derive the full 
benefit from it, but the end result should 
be well worth the added effort. 

The examples of evoked response pat- 
terns which have been presented show that 
there are marked differences in the nature 
of the response under various conditions of 
stimulation. At one extreme is the very 
simple oscillatory pattern produced by the 
pinpoint of light striking the fovea; at the 
other is the complex pattern produced by 
high-level ganzfeld stimulation. On the 
basis of the changes noted in the response 
pattern under various conditions (color, 
intensity, background level, and retinal 
locus of stimulation) the conclusion has 
been reached that there are a number of 
component responses, each related to some 
aspect of the stimulus situation. It is fur- 
ther concluded that the evoked response 
pattern produced by the high-level ganz- 
feld stimulation is a composite of these 
various responses. In Figure 10 a sample 
response pattern to ganzfeld stimulation is 
presented, along with a tentative break- 
down of the component elements. 

Process I appears to be related to the 
degree of scotopic activity. Its amplitude 
varies with the relative intensity of the 
stimulus flash—becoming greater as the 
background is reduced with a given flash 
intensity and also as the flash intensity is 
increased with a given background level 
(Figures 1 and 2). 
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Fra. 10. Tentative classification of processes and components making up the complex response pat- 
tern evoked by high intensity white light stimulation in the ganzfeld condition. Sample record shown is 


the same as that in Figure 5. 


Process II is assumed to be related to 
photopic activity. Data from the color 
studies and the brightness discrimination 
study (both described earlier) indicate its 
probable relation to a green response and/ 
or a photopic brightness discrimination 
mechanism. Its peak latency is relatively 
constant under all conditions of stimulus 
intensity and background level. 

Process III is assumed to be related to 
some aspect of photopic activity. This as- 


sumption is based on the color studies, 
where Peak Cz was so evidently related to 
a red response (Figures 3, 4, and 6), and 
on the pinhole stimulus situation (Figure 
9) wherein only the fovea was stimulated. 
This process is differentiated from the 
other presumably photopic response, Proc- 
ess II, by the fact that its latency varied 
considerably as a function of stimulus in- 
tensity and background level. Its latency 
is seen to decrease as the total amount of 
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light in the stimulus situation increased, 
including both the stimulus flash and the 
background level (see Figure 2). 

Process IV has tentatively been classi- 
fied as being related to a blue color and/or 
a scotopic response mechanism. The rea- 
sons for this were pointed out earlier, and 
are clearly shown in Figures 3 and 4. The 
positive peak, Cz, is seen to be very sensi- 
tive to background level, its amplitude 
decreasing markedly as background inten- 
sity is increased. The probable relationship 
of Peak C, to a blue response mechanism 
is shown most strikingly for S RH in Fig- 
ure 3.5 It has been found for this S, with 
blue light stimulation, that Peak C2 is 
more sensitive to changes in background 
level and Peak C, is more sensitive to 
changes in flash intensity. This would also 
suggest that C, might be related to a 
photopic response to blue light. 

One final comment should be made re- 
garding the possible significance of the 
various components of the complex evoked 
response pattern. A preliminary study in 
which responses to ganzfeld stimulation 
were compared to those evoked by a struc- 
tured visual field indicated that the later 
components, around 250-300 msec. (C4 and 
Cs), were of much greater relative ampli- 
tude in the structured field situation. If 
further investigation verified this point it 
is believed that this may be relevant to 
perceptual studies on the “serial processing 
of visual information" such as those of 
Kolers (1964), who found that approxi- 
mately l5 sec. was necessary for the as- 
similation of contour information. 

The form of the evoked response pat- 
terns obtained is dependent on a number of 
factors regarding experimental technique, 
so it is difficult to compare one's results 
with those of other workers unless the 


5Tt should be pointed out that the response pat- 
terns shown for RH in that figure were taken from 
parametric studies which were carried out for var- 
ious colors of the stimulus flash. For each stimulus 
color used, flash intensity and background level 
were varied as they were for the white stimulus 
flashes in the first study described (Figures 1 and 
2). We are not planning to publish the more de- 
tailed report on color responses until we have à 
chance to run complete parametric studies on 
more subjects, so that a more meaningful com- 
parison of individual differences can be made. 


identical procedures were followed. Elec- 
trode placement, the use of monopolar or 
bipolar electrodes, and the time between 
stimulus presentations are three such fac- 
tors which can lead to a wide divergence in 
the results obtained. In all the situations 
reported here a monopolar occipital elec- 
trode was utilized, and the stimuli were 
always presented about 1 sec. apart. 
Other variables of importance, such as 
whether a ganzfeld stimulus or a restricted 
field was used, are indicated in the various 
sections. 

The portion of the evoked response pat- 
terns which we have been concerned with 
in this paper corresponds to that which 
Cigánek (1961) has termed the “secondary 
response” and Spreng and Keidel (1963) 
refer to’ as the “medium components.” 
These authors agree that the characteristics 
of these components indicate that they 
are related to activity of the nonspecific, 
perhaps diffuse afferent pathways. 'These 
characteristics are the long latencies in- 
volved, the fact that these components can 
be recorded from broad areas of the scalp, 
and the fact that they are elicited by 
stimulation of various sense modalities. 
The point of interest here, however, is 
that correlations with specific aspects of 
stimulation within a given modality can 
be obtained if inputs from the other mo- 
dalities are precluded. 

Even though certain of the components 
occurring during the period following stim- 
ulation in question (roughly 80-300 msec.) 
may be evoked by the various modalities, 
it is still quite possible that some of the 
other components might be related to 
a specific type of input. For example, the 
prominent “medium components” evoked 
by acoustical stimulation, as presented by 
Spreng and Keidel (1963), appear to be 
most closely related to those components 
we have tentatively identified as being re- 
lated to photopic activity in our work with 
visual stimulation (i.e., the components des- 
ignated as C; and Cs). It seems quite 
possible that the unique dual nature of 
the visual system, wherein the scotopic- 
related neural activity occurs later than 
that of the photopic-related activity, might 
well give rise to evoked components that 
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are modality specific. The component we 
have designated as C», which appears to 
be related to scotopic activity, is suggestive 
in this regard. In other words, it might 
be well to consider the photopic and sco- 
topic visual systems as two separate sub- 
modalities in view of the different time 
courses of their neural activity following 
stimulation. 

There is one rather major way in which 
our tentative description of the events fol- 
lowing the onset of stimulation differs 
from those of the other workers referred to. 
The striking change in the character of the 
complex evoked response pattern which 
occurs around 250-300 msec. after the on- 
set of stimulation is usually described as 
marking the end of the “secondary re- 
sponse” or “middle components” and also 
the beginning of the “rhythmic” or “oscil- 
latory” afterdischarge. There is the def- 
inite implication that this afterdischarge 
is of a different nature than that occur- 
ring during the "secondary response" in 


such a description. In our studies utilizing 
as purely a photopie stimulus as possible 
(a pinpoint source of light, fixated foveally, 
with a light-adapted eye) it was found 
that the oscillations clearly started between 
100 and 200 msec. after stimulus onset 
and continued on into the time domain of 
the so-called “afterdischarge” (Figure 9). 
Therefore we interpret the oscillatory ac- 
tivity occurring after 250-300 msec. fol- 
lowing onset of stimulation as being a 
continuation of a process which was initi- 
ated during the time of the “secondary 
response." 

The marked change in the character of 
the waveform in the region of 250-300 
msec. is believed to be related to the dying 
out of other processes, which do exist only 
during the period of the "secondary re- 
sponse." These processes are the ones des- 
ignated as I and IV in Figure 10. The 
termination of these processes would leave 
Process III, in relatively pure form, as the 
oscillatory afterdischarge. 
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-HETEROMODAL EFFECTS UPON VISUAL THRESHOLDS 


EDWARD T. DAVIS 
Veterans Administration Hospital, Bedford, Massachusetts 


Processes underlying the transmission and coordination of 2 different kinds 
of sensory excitation were studied. A neurological model accounting for 


specific heteromodal effects was proposed. The method involved the de- 
termination of visual thresholds in normal and brain-injured Ss while they 
were being subjected to an auxiliary aural stimulus of moderately loud in- 
tensity. The results demonstrated group differences in the effect sound has on 
visual thresholds and provided information on the diminishing effectiveness 
of a constant auxiliary stimulus when it is maintained for a period of several 
minutes. The findings were reviewed in the light of past and present theoreti- 
cal explanations and related to a brain model which accounts for both 
facilitative and inhibitory effects of auxiliary stimulations. 


Ó s mechanisms by which all kinds of 
everyday sensory excitations are some- 
how compounded into a meaningful experi- 
ence for an individual must certainly be an 
action of a most complex order. Although we 
have little knowledge of the processes me- 
diating the final consummation of experi- 
ence, there can be little doubt that there 
are definite physiological prerequisites for 
this phenomenon. 

One approach to the problem just men- 
tioned has been the study of very basic in- 
tersensory effects such as the influence a 
sound has on an absolute visual threshold. 
As a means of making intelligible the re- 
sults of such studies, most investigators 
have offered theoretical explanations which 
involve the transmission and coordination 
of neural excitation in the central nervous 
system. 

“In an attempt to avoid complexities that 
are apt to occur in any perceptual study, 
experimenters have tried to eliminate any 
‘overlay of meaning from the stimuli they 
have used. Attempts have been made to 
keep the subjects as objective as possible, 


3 Based in part on a doctoral dissertation sub- 
mitted to the Department of Social Relations at 
Harvard University. The author wishes to express 
his appreciation to G. S. Klein for his guidance and 
encouragement. Additional thanks go to W. S. 
Verplanck and G. A. Miller whose advice and 
technical knowledge were essential in determining 
the method of measuring visual thresholds in this 


study. 


and thus tasks involving a minimum of in- 
terpretation are the most useful. Usually a 
small white patch of light and a reasonably 
pure tone of a given frequency are used in 
such studies. 

The study to be outlined here is an at- 
tempt to clarify elementary heteromodal 
relationships. The objectives may be briefly 
stated as follows: > 

1. To extend the examination of hetero- 
modal processes by considering temporal 
and intensity factors in the effects of an 
auditory stimulus on a visual threshold. 

2. To study subjects suffering from se- 
vere brain injury in the hope of supporting 
or opening to question some aspects of neu- 
ral explanations. 

3. To develop a theoretical model which 
would allow a parsimonious explanation of 
these extended intersensory relationships. 

The exact processes underlying the 
transmission and coordination of excitation 
in the cortex has intrigued both psycholo- 
gists and neurophysiologists. Kohler and 
Wallach (1944) and Lashley, Chow, and 
Semmes (1951) have used behavioral cri- 
teria as a means of exploring electrical en- 
ergy transmission in intracortical processes 
and Chang as early as 1952 ventured to 
suggest a type of neural process that could 
account for the influence of auditory stimu- 
lation on a visual pathway. Jung (1961) 
and Jung, Kornhuber, and Da Fonseca 
(1963) have summarized a wealth of infor- 
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mation, based on the electrocortical stud- 
ies of many investigators. They have re- 
lated neural functioning to a variety of 
subjective phenomena which more often are 
thought to lie within the province of the 
psychologist. 

The theoretical background of the pres- 
ent study was influenced by Hebb's (Hebb, 
1949) attempt to apply neurophysiological 
concepts to the explanation of behavior. It 
seemed reasonable to assume, as Hebb did, 
that today's neurophysiology could pro- 
vide facts and theories that would allow 
the construction of a testable hypothesis 
for the examination of intersensory effects. 
Many of the sensory pathways can be 
shown to be anatomically proximate at 
some point in the brain; and recent inves- 
tigators, both psychologists and neuro- 
physiologists, have theorized that in the ac- 
tion following heteromodal stimulation 
some type of neural communication must 
exist to account for the results obtained by 
accessory stimulation. 

Present explanations suggest that the 
neurons of the tested modality are in- 
creased in excitability because of an ana- 
tomical proximity to neurons of the acces- 
sory modality that are firing at the same 
time. The increased excitability is based on 
the theory that the local potential of neu- 
rons in the tested modality is raised by the 
electrical activity of the neurons firing in 
response to the accessory stimulus. In this 
way the threshold is lowered. In such a 
theory, a neuron that is capable of respond- 
ing to the excitation of at least two modali- 
ties is an essential element. 

Early experiments demonstrating the 
convergence of different sense modalities on 
neurons of the reticular formation (Amas- 
sian & DeVito, 1954; Baumgarten, Von 
Mollica, & Moruzzi, 1953; Scheibel, Schei- 
bel, Mollica, & Moruzzi, 1955) have led to 
the examination of many other polysensory 
areas including the cortex. More specifi- 
cally, single unit analysis has shown that 
the same cell may be fired by afferent vol- 
leys from different sensory pathways 
(Amassian, 1954; Segundo & Machne, 
1956). 

From a psychological point of view, how- 


ever, these studies raise an important ques- 
tion. Once such a polysensory element is 
fired, to which sensory experience does such 
a mutually recruitable neuron contribute its 
effect? 

Hebb (1949) has suggested that time re- 
lationships are of extreme importance in 
neural organization. Single neural units be- 
come functionally integrated cell assem- 
blies through simultaneity and ordering of 
firing. Thus a neuron may contribute to 
one “phase sequence” or another, depend- 
ing upon the time at which it fires. Segundo 
(Segundo & Machne, 1956) has suggested 
that a convergence of this type upon a 
single neural element does not necessarily 
mean complete loss of that level of the ca- 
pacity to discriminate between different 
sensory stimuli. He points out that the tem- 
poral pattern of response of that unit to 
each stimulus may be very different. In 
this manner information may be preserved 
on the basis of temporal criterion even in 
the presence of spatial convergence. Such a 
possibility would allow the neuron that re- 
sponds to two sensory excitations to con- 
tribute its effect to first one and then the 
other sense modality. Its selectivity would 
depend upon the relative rate at which the 
two modalities bombarded it with excita- 
tion. That is, the modality that bombarded 
it with the greater frequency would have 
the better chance of finding such a mutu- 
ally recruitable neuron in a nonrefractory 
state and firing it. Since frequency of neu- 
ron discharge is related to intensity of 
stimulation, the sensory receptor receiving 
the greater amount of excitation would 
“steal” or “capture” the disputed neuron. 
It does not mean, however, that subliminal 
excitation arriving from the accessory stim- 
ulus might not lower the threshold of a 
neuron and thus facilitate its firing in the 
primary system by spatial or temporal 
summation. 

The existence of many such neurons 
varying in threshold of excitability is the 
key postulate in the theoretical organiza- 
tion of this thesis. Such a hypothetical neu- 
ronal scheme has been postulated by Jung 
(Jung et al., 1963). In general the connec- 
tions are inferred from the results of neu- 
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ronal recordings from the cat's cortex. He 
has conjectured that in as much as reticu- 
lar neurons already receive the convergence 
of several sensory modalities, a subcortical 
component of multimodality input is very 
probable. However, he also includes the 
possibility of intercortical connections. 

The postulates used in this thesis are 
necessarily psychological in nature since 
the data collected are gleaned from behav- 
igral events. They are stated with no inten- 
tion of testing neurological concepts, but 
the underlying reasoning for these assump- 
tions is based on the theorizing of previous 
investigators and is extended and sum- 
marized by the use of a hypothetical brain 
model. 

In order to trace the development of the 
thinking in this paper, essential character- 
isties of the brain model are described and 

, summarized immediately prior to the speci- 
fication of the formal postulates associated 
with this study. Both types of explanation 
are developed simultaneously with the in- 
tention of making the two levels of reason- 
-. ing clearly distinguishable. 

Figure 1 depicts a neurological model of- 
fered as a schematic analogue of the proc- 
esses under investigation. It is borrowed 
from the thinking of Jung (Jung et al., 
1963) and is presented with the expectation 
that it is capable of generating heuristic 
possibilities. 

As Fessard (1961) has pointed out, neu- 
ral models are simplified neural circuits of 
special design and it is unsound to infer 
close resemblances in internal organization 
and the performance of other operations 
known to occur in nature without substan- 
tial support for such reasoning. Jung 
(Jung et al., 1963), however, believes that 
such a caution does not preclude the ex- 
pediency of developing common lines of rea- 
soning in the reduction of problems that 
require both neurophysiological and psy- 
chological knowledge. Licklider (1961) has 
suggested that psychophysical models offer 
the possibility of bringing together in pro- 
ductive interaction, facts and findings from 
a variety of sources. To support this view he 
has proposed a model drawn from a variety 
of disciplines which accounts for the anal- 


gesic effect music and noise have on pain 
thresholds. 

Since there are, as yet, no definite neuro- 
physiological observations underlying ef- 
fects found in this study, the explanation 
offered is admittedly speculative. Although 
speculation may be hazardous it fulfills an 
important purpose when it offers perspec- 
tive to a field of inquiry or when it stimu- 
lates research designed to replace specula- 
tion by factual demonstration. It is hoped 
that the following model fulfills both of 
these criteria. 


POSTULATES AND THE ASSOCIATED 
Brarn MODEL 


The model represents à neural network 
assumed to exist in the human brain. Neu- 
rons I and II are neurons in primary path- 
ways serving two different senses. The neu- 
ral chains linking Neuron I and Neuron X, 
and Neuron II and Neuron X are function- 
ing cerebral parts of their respective sen- 
sory modalities (I and II). Neuron X is a 
cerebral neuron. It is assumed that the net- 
work represented by Figure 1 exists in 
large numbers in the human brain. The 
functional properties of this network in- 
clude the following elements: With each 
sensory modality there are a number of 
cortical neurons which may be fired by 
more than one kind of modal excitation 
and which contribute to the intensity of 
experience of the modality by which they 
are fired. The frequency of neural dis- 
charge in a modal network is positively 
related to the strength of the sensory stim- 
ulus. 

Postulate 1. Any sense modality has a 
process P which may be stimulated by 
stimuli of that modality or by stimuli 
from other modalities (e.g, Py is a func- 
tion of the visual stimulus Iy, and/or the 
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Fic. 1. Neurological brain model. 
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intensity of other stimuli such as In, and 
aural stimulus. 

It has already been suggested that a mu- 
tually recruitable neuron (or mutually re- 
eruitable neurons) is selected and fired by 
one modality or the other. Implicit in this 
deseription is the assumption that temporal 
relationships allow neural threshold re- 
sponses to contribute to only one modality 
at a time. Thus, if a group of neurons 
which were originally a part of the visual 
process P, are more heavily bombarded by 
excitation from another modality they 
shift their effect to the latter system. That 
is, once fired a neuron contributes its ef- 
feet to either the visual or auditory proc- 
esses but not to both. The relationships of 
such an assumption are more generally 
stated in the following postulate form: 

Postulate 2. If P is aroused to a thresh- 
old of contribution by I, it is no longer 
aroused to a threshold of contribution by 
I, and no longer functions as a part of the 
I, system. 

The process by which mutually recruita- 
ble neurons shift their effect has already 
been deseribed as positively related to the 
frequency which a given modality excites 
them. That is, the probability of a given 
mutually recruitable neuron (X) being 
fired by a given modality (P 1) is a direct 
function of the ratio of frequency of neural 
discharge at that model synapse with X 
(w) to the sum of all modal discharge fre- 
quencies synapsing with X (w plus y) that 
is, 


frequency M 1 
frequencies M 1 + M2 --- Mn 


The relationship is postulated in the for- 
mal modal as follows: 

Postulate 3. P, bears a relationship to 
visual and auditory stimuli at levels of 
threshold contributions such that: 


PIX =f 


P, = (f) intensity d us stimulus 
vct h 
"s IL, 
Beh 


It has been demonstrated that neural 
tissue becomes fatigued under continuous 


stimulation; that is, the frequency of firing 
of a given fiber drops even though the 
strength of the stimulus remains the same: 
that is, with a constant stimulus, the fre- 
quency of neural discharge in its modal 
network does not remain constant but re- 
duces to approach a stable lower level as 
the stimulus is maintained. This effect is 
accounted for in the following postulate: 

Postulate 4. The value of I, and I, is a 
function of the duration of the appropriate 
stimulus, S, and S, which has produced 
them. The relationship is such that the in- 
tensity decreases with time since the appli- 
eation of the stimulus. 

With the experimental procedure used 
in this study, the visual stimulus is not as 
constant as the auditory tone used. The 
visual stimulus fluctuates in intensity and 
is often interrupted by the involuntary 
eye blinking of the subject. Since neurons 
regain their excitability after continuous 
stimulation within a short recovery period, 
this relationship is summarized thus: The 
rate of decline of the frequency of neural 
discharge in a modal pathway is positively 
related to the constancy of the stimulus 
impinging upon this avenue of excita- 
tion. It is formally accounted for in Postu- 
late 5: 

Postulate 5. The value of I, for a 
given S, tends to maintain its maximal 
value if a recovery period exists between 
the cessation of the last application of 
S, and the current application of Sẹ. 

Since the data in this study are psycho- 
logical data, it is necessary to postulate a 
relationship between the assumptions just 
listed and the behavior tested. Neurologi- 
cal evidence demonstrates that the inten- 
sity of experience of a sensory stimula- 
tion is positively related to the number of 
cortical neurons fired by that stimulus in 
its modal area. From this the formal 
statement becomes Postulate 6: 

Postulate 6. The reported visual thresh- 
old is proportional to the reciprocal of P 
(i.e., the larger P, the lower the thresh- 
old T,). 

A consequence of the characteristics 
ascribed to the brain model allows the 
following prediction: A mutually recruita- 
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ble neuron will tend to be captured by the 
modality bombarding it with the greatest 
frequency of excitation. 
The formal reasoning leading to the first 
theorem rests on the formal postulates. 
| Since we have assumed that P, can be 
measured by a visual threshold (Postu- 
p late 6), certain effects of a moderately 
' loud auditory stimulus on a visual thresh- 
| 9d can be deduced. 
h Postulate 1 states that P can be 
aroused by excitations from the audi- 
Í tory and/or visual system. Postulate 2 
f states that a contribution effect of P can- 
. not serve as a part of the visual and audi- 


' tory systems at the same time. Since, by 


Postulate 3, P is recruited in a direct rela- 
tion to the intensity of a given stimulus, 


. then Theorem 1 follows: 


—————— ms ra 


Theorem 1. A moderately strong audi- 
tory stimulus will initially raise the visual 
threshold of normal subjects. 

Again, since excitation resulting from 
stimulation decreases with time (Postulate 

. 4) it is only necessary to assume the exist- 
ence of a recovery period for the visual 
pathway in order to predict a recovery of 
the visual threshold with time (Postulate 
5). Such a recovery period for the visual 
pathway could be produced by a fluctuat- 
ing target light and the eye blinking asso- 
ciated with the fixation of a visual target. 
The auditory pathway would have no such 
respite. Thus, the more frequently excited 
and more constantly excited modal path- 
way will fatigue earlier. 

With the addition of Postulates 4 and 5 
the second theorem may be deduced. 

Theorem 2. As the sound is continued, 
the raised visual threshold will tend to 
drop, in normal subjects. 

Theorem 3 is based on the assumption 
that severe cortical lesions often interrupt 
the paths of communication from primary 
projection neurons to commonly recruitable 
neurons. Several possible ways in which 
this might occur follow: 

1. Damage to, or extirpation of, cortical 
tissue might reduce the number of mutually 
recruitable neurons. 

2. The location of the lesion may actually 
impinge upon and reduce the efficiency of 


> 


neural pathways which offer the means 
of communication. 

3. Edema associated with tissue damage 
might extend the area of altered tissue 
functioning beyond the site of damage. 

4. It is frequently observed that in in- 
dividuals with gross brain damage there is 
neural discharge at the periphery of the 
lesion, Such discharge could reduce the ef- 
ficiency of communicating neural links if 
these paths were excited by the lesion dis- 
charge. That is, in summary neurons and 
neural functioning is impaired by brain 
damage. Postulate 7 includes these possi- 
bilities: 

Postulate 7. There is a class of events 
D (which includes brain injury), which 
impairs process P. 

Since neural communication will be less 
effective where tissue damage has oceurred, 
the capturing of mutually reeruitable neu- 
rons will be less, with accessory stimula- 
tion, than where undamaged tissue exists, 

Theorem 3 follows from the inclusion of 
Postulate 7: 

Theorem 3. The visual threshold of sub- 
jects possessing an element from the class 
of events D (brain injury) will be raised 
less than those of subjects free of events 
D, with the introduction of a moderately 
strong auditory stimulus. 

A more extensive examination of the 
neurophysiological literature in support of 
the cited neural speculations will be pre- 
sented in a later section of this paper. 


EARLY STUDIES AND THEORETICAL 
CONTRIBUTIONS 


The history of the problem starts with 
Urbantschitsch (1888, 1902). His is still 
one of the most extensive studies of this 
type for he proceeded to investigate all 
the sensory modes, and noted the effect of 
each upon the other. His results, however, 
were not very consistent and controls of 
extraneous physical conditions likely to 
influence the results were almost entirely 
lacking. The existence of intersensory ef- 
fects, however, has been supported by a 
host of later investigators. Heymans in 
1904 investigated the effect of electrical 
stimulation of the hand on auditory sen- 
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sitivity. Two subjects were used in a se- 
ries of trials. 

In one series the proportion of time a 
watch was heard at a given distance was 
used as a criterion of auditory sensitivity. 
Without electrical stimulation, the watch 
was heard almost all the time, but with in- 
creased intensity of shock the time during 
which the watch was heard was reduced, in 
some instances, to as little as 69 seconds in 
a 5-minute period. In another series, the 
distance threshold for auditory sensitivity 


under the same condition was tested. It. 


decreased from about 2 meters without 
shock to about 1 meter with the most in- 
tense shock, which again indicated a de- 
crease of auditory sensitivity. 

A few years later, Jacobsen (1911) pre- 
sented evidence to demonstrate that sound 
diminished the strength or intensity of 
weight sensation. This was investigated 
with a judgment of weights with and 
without a simultaneous sound. He also re- 
versed the procedure and reported that 
sounds were judged to be louder without 
concomitant pressure sensations. 

In contradiction to these demonstra- 
tions of sensory inhibition there are other 
very similar studies illustrating a facilita- 
tive effect of a secondary sensory stimu- 
lus upon a primary response. Ide (1919) 
tested the effect of temperature on weight 
judging. Both cold (45 degrees fahrenheit) 
and hot (147 degrees fahrenheit) weights 
were found to feel heavier than compari- 
son weights at room temperature. Ide be- 
lieved that the effect was simply a matter 
of increasing the total amount of sensation. 

Hartman (1983) studied the effects of 
simultaneously presented stimuli upon an 
acuity threshold. His subjects judged the 
ease of discrimination of figures on a con- 
trasting background (black on white and 
white on black). He reports evidence in- 
dicating that visual acuity can temporar- 
ily be increased by the simultaneous appli- 
cation of auditory, olfactory, and cutaneous 
stimuli, and that high and low tones, pleas- 
ant and unpleasant odors, mild tactile and 
pronouncedly painful stimuli, all enhance 
the ability to discriminate the test config- 
urations. 

These contradictory results indicate that 


heteromodal stimulation involves more 
than a determination of whether or not 
auxiliary stimulation is either inhibitory 
or facilitative. Gilbert (1941) has pointed 
out, in an article reviewing intersensory ef- 
fects, that temporal and quantitative fac- 
tors should be considered in examining 
heteromodal results. This article notes that 


m 
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Jacobsen recognized that a continued auxil- 


iary stimulus could lose much of its in- 
hibitory power and that it might, under 
certain conditions, augment rather than in- 
hibit a primary response. Gilbert’s conclu- 
sions, from an analysis of the studies re- 
viewed, suggest a key to understanding the 
effects described and involve the following 
considerations. (a) A sufficiently intense 
stimulus will momentarily reduce sensitivity 
in another modality, and increase it after 
an optimum interval. (b) A less intense 
heteromodal stimulus will momentarily in- 
crease sensitivity. 

Since this hypothesis allows an integra- 
tion of results that must otherwise appear 
contradictory it is useful in the review of 
the investigations presented below. 

In 1923, Newhall, and in 1934, Thorne, 
studied the effect of auditory stimulation 
on visual sensitivity with both stimuli pre- 
sented simultaneously. In Newhall’s study 
the subjects judged a given liminal light as 
superthreshold when clicks were added. 
Thorne’s results were based on measure- 
ments of a visual threshold made under 
conditions of silence and a simultaneous 
buzzer. The effects of the buzzer were not 
constant; both facilitative and inhibitory 
effects were observed. On the whole, how- 
ever, inhibitory effects were much more 
marked. 

Thorne suggested that when the auxil- 
jary stimulus was relatively strong it “be- 
comes a figure in the perceptual figure- 
ground relationship and raises liminal sen- 
sitivity or exerts an inhibitory effect ; when 
it continuously occupies the ground, it fa- 
cilitates with resulting lowering of the 
threshold.” 

When we apply the Gilbert hypothesis to 
these two studies, we would expect New- 
hall's results to reflect a facilitative effect 
since a click must be a relatively mild 
auxiliary stimulus. Thorne’s buzzer, on the 
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other hand, could be loud enough to pro- 
duce the predicted inhibition. It is of 
interest to remember that Urbantschitsch 
reported that a loud tone or noise was nec- 
essary to achieve the inhibitory effect. In 
considering this, it is also of interest to 
note Jacobsen’s stimulus conditions. He 
found that when pressure was inhibited 
by sound, weights of 10 to 30 grams were 
used; but when pressure inhibited auditory 
sensation, a weight of 300 grams was used 
as the stimulator of the auxiliary sensa- 
tion of pressure. 

The investigator who has produced the 
greatest amount of well-controlled quanti- 
tative data in this field is Kravkov (1930, 
1936). He has investigated a wide variety 
of stimuli effecting a modification of the 
visual process. His results indicate that a 
concomitant auditory stimulation has a 
pronounced effect on both peripheral and 
foveal vision. He found that a tone of 2,100 
cycles per second and 100 decibels greatly 
diminished light sensitivity for peripheral 
vision but increased sensitivity of foveal 
vision to white light (Kravkov, 1936). His 
studies also indicated that visual acuity 
could be altered with a concomitant audi- 
tory stimulus (Kravkov, 1930). His find- 
ings have been substantiated by Semenov- 
skaia (1946), who found a similar effect 
under comparable conditions, and by 
Bogoslowski and Kravkov (1941) who 
found that the noise of an airplane motor 
raises the threshold of the rod apparatus 
in night vision, while in foveal and day vi- 
sion the threshold is lowered. 

The explanation offered for these effects 
is based on the concept of irradiation. 
Kravkov and his followers suggest that 
the excitation in the brain originating with 
the sound does not remain strictly loca- 
lized, but is transmitted to neurons of the 
optic nerve, because of their anatomical 
proximity; thus subliminal excitation is 
created in the visual center. 

In these studies strong accessory stimula- 
tion resulted in both facilitative and inhibi- 
tive effects. The latter is in agreement with 
Gilbert’s analytical scheme, but the facili- 
tation of foveal vision is contradictory, and 
will be examined in greater detail in a 
later section of this paper. 


Child and Wendt (1938) experimented 
with the influence of a flash of light upon 
the audibility threshold. This threshold was 
determined by a tone of 165-millisecond 
duration. Their principal experimental var- 
jable was the time interval between the 
flash of light and the tone. When the light 
and the tone were simultaneous, or when 
the light preceded the tone by a Y sec- 
ond or 1 second, there was a highly relia- 
ble increase in the frequency with which 
near-threshold tones were reported as 
heard. The maximum effect was found 
when the light preceded the tone by % 
second, or preceded it by 2 seconds; there 
was no consistent facilitating effect. It 
should be noted, however, that the auxil- 
iary stimulus was a 2-degree circular patch 
of light. of approximately 50 footcandles, 
one-tenth of a second in duration. Here 
again the intensity of the auxiliary stimu- 
lus is not great, and inhibition would not 
be expected from the application of the 
assumptions that we have considered in 
connection with the previous studies. 

Child and Wendt’s study sharpens the 
importance of very short-time relationships 
and permits the introduction of a central 
explanation of the effects that is consistent 
with the findings of investigators in neuro- 
physiology. 

Child and Wendt felt that their findings 
suggest a temporal summation of excita- 
tion in the central nervous system. Facili- 
tation was interpreted as being due to 
the convergence of the two sets of impulses 
upon a final common path, and intervals 
which permit facilitation were interpreted 
to be a function of the relative latency and 
recruitment periods of the two converging 
excitations. One difficulty with this explana- 
tion is that the facilitating intervals are, 
in general, longer than would be expected 
from the work of such people as Hilgard 
(1933) and Wendt (1930) who found 
auxiliary stimulation could facilitate a re- 
flexive response only when the time interval 
was less than 300 milliseconds. Also, in 
motor summation studies it has almost al- 
ways been found that when the interval be- 
tween stimuli is increased beyond the maxi- 
mum interval that permits facilitation, an 
inhibitory effect is exhibited, and that the 
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magnitude of the inhibitory effect is often 
greater than that of the facilitating effect. 

The history of the literature focuses at- 
tention on the need for exacting temporal 
investigation in heteromodal studies, and 
it points to the importance of intensity of 
stimulation in intersensory effects. How- 
ever, no direct investigation of strong con- 
tinuous accessory stimulation has been at- 
tempted. An examination of such influences 
is carried out in the present thesis. 

The literature also attempts to relate 
specific behavior to neural functioning, to 
bridge a gap between what is usually 
thought of as two different levels of ap- 
proach to the explanation of behavior. In 
recent years such attempts have achieved 
considerable sucess. Hebb’s (1949) theory 
neural processes in behavior offers chal- 
lenging possibilities. and the theoretical 
framework is complete enough to offer an 
explanation for a wide variety of psycho- 
logical events. Kéhler and Wallach (1944) 
have advanced a theory of cortical func- 
tioning based on the perception of simple 
figures. Klein and Krech (1952) have 
extended Kóhler's theory to account for 
inter- and introindividual differences in be- 
havior. As a result of their study investi- 
gating figural aftereffects, they have sug- 
gested that the “concrete behavior" of 
brain-injured individuals described by 
Goldstein and Schurer (1941), Werner 
(1940), and others may be viewed as in- 
stances of disturbed integration. They feel 
that such behavior may be attributed to an 
impaired communication process among 
different cortical areas, 

Klein and Krech’s study then further 
suggests that two stimuli which normally 
alter cortical functioning enough to affect 
a visual threshold, when presented simul- 
taneously, would be less pronounced in 
brain-damaged subjects, 

The study described in the following 
pages attempts to examine this possibility. 


Mernop 


Subjects 


Two groups of subjects were used in this study; 
one organically “normal” and the other made up 
of patients with cerebral lesions of & severe nature. 


The "normal" group consisted of attendants at 
the Boston Veterans Administration Hospital. 
These subjects were free of both known cerebral 
lesions and serious visual or auditory anomalies 
as determined by a routine physical examination 
required by the Veterans Administration of such 
employees, 

The experimental group was drawn from the 
neurological wards at the Boston Veterans Admin- 
istration Hospital. Subjects known to be free of 
serious auditory or visual deficiencies, but with 
known cortical damage, were selected. 

The two groups were equated with the following 
criteria: 

1. The age range included subjects from 18 to 
45. Within this age group there is no known reason 
to expect any differences in general physiological 
functioning. 

2. Only male subjects were used in order to 
avoid problems associated with sex differences. 

3. Intelligence was roughly equated by limiting 
those selected to an IQ range of 85 to 120, The 
Stanford-Binet vocabulary test was used when a 
more extensive evaluation was unavailable. The 
patients usually had complete psychological analy- 
ses and these subjects were selected on the recom- 
mendation of the staff psychologist who had 
worked with the patient. Inasmuch as some brain- 
injured patients suffered from various degrees of 
aphasia, a vocabulary index of intelligence was 
not always felt to be valid, and thus more exten- 
sive test results were used. 

4. Subjects who had had long experience at 
such occupations as radio code receiving or posi- 
tions involving extreme visual acuity were ex- 
cluded, 


Apparatus 


The apparatus consisted of an observation box, 
an adaptometer (NDR-C Model 2A), a signal 
generator with earphones, and recording equip- 
ment (Figure 2). 

The observation box was fitted with a rubber- 
edged eye shield that served effectively as a head 
rest. At approximately 20% inches from the eye 
of the observer a small red fixation light was 
placed in the center of the observer's visual field. 
At 2 degrees (visual angle) below this, the target 
light was placed. Its diameter was ys inch. The 
surface of the target area was diffused by a trans- 
lucent screen. The light source for the target area 
was a 6-volt filament lamp. The filament current 
was supplied from 110 ac lines, but a Sola constant 
voltage transformer was used to avoid line fluc- 
tuations before the current was dropped to 6 volts. 
The fixation light was supplied separately with 
the use of a variac. 

A Hartline adaptometer with a 2-log-unit glass 
wedge was used to control the target light inten- 
sity. The wedge was automatically driven by a 
small high speed motor. A button which the sub- 
ject held in his hand would instantaneously re- 
verse the direction of the motor and thus the di- 
rection of the wedge. When the motor was turned 
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Fro. 2. Apparatus for the determination of visual thresholds. 


on it ran continuously; only its direction could be 
altered. 

A blue filter was placed between the wedge and 
the target area to neutralize the yellowish cast of 
the tungsten bulb. 

In order to record the movement of the wedge 
a small drum was attached to its driving shaft, 
and a small chain, winding and unwinding on the 
drum, moved a recording pen back and forth over 
recording paper driven at 2.75 inches per minute. 

The signal generator was an audio-oscillator 
producing a tone of 1,550 cycles per second. The 
earphones were type ANB-H-1 with sheepskin ear 
pads. 

An adjustable chin rest was used to control 
head movement. 


Procedures 


All subjects were administered two or more 
drops of 1% euphthalamine, depending upon the 
amount needed to achieve an effective mydriasis. 
They were then dark adapted for V$ hour before 
the experimenting began. During this time the 
general procedure of the test runs was outlined 
to the subjects, and they listened for several mo- 
ments to the sound used in the experiment in or- 
der to familiarize them with this stimulus. 

The subjects were seated comfortably at a 
table with their heads supported by the head and 
chin rest on the adaptometer. In this position, two 
lights were visible, the red fixation light and the 
target patch. The subjects were told that the white 
light would gradually grow dimmer. They were 
instructed to push the response key the moment it 
reappeared. 

A practice period of about 20 minutes was 
needed usually before a subject developed a rea- 
sonable proficiency in responding to the fluctuat- 
ing light. All trials began with the target light at 


maximum intensity in order to provide a common 
starting intensity for all subjects. 

Response speed was checked by suddenly cov- 
ering the light source during a descending thresh- 
old determination. An immediate reversal of the 
recording pen, with this interruption, indicated an 
alert subject and a quick response. The completion 
of the practice series was followed by a short rest 
period. The glass wedge was then reset for maxi- 
mum intensity of illumination and the testing 
started. The motor operating the adaptometer 
was turned on and the subsequent keying responses 
of the subject automatically recorded his thresh- 
olds. At the end of 2% minutes a moderately loud 
tone of approximately 70 decibels (4 volts across 
earphones) and a frequency of 1,550 cycles per 
second was introduced by means of earphones 
placed on the head at the beginning of the ex- 
periment. The tone was maintained for 3% min- 
utes. At the end of this time the subjects were 
given a rest of 4 to 5 minutes. Two more 6-minute 
runs with an intervening rest period completed 
the series. 

All threshold determinations were carried out 
in a completely darkened and reasonably silent 
room. An attempt to control the proper fixation by 
the subjects was made by reminding them to 
“watch the red light," a number of times during 
the series. It was always repeated a short time 
before the sound was turned on. 

The equipment used was designed to reduce 
extraneous variables to a minimum, However, it 
is impossible to rule out such influences entirely. 
The comments of many of the subjects, in this 
study, indicated that the 6-minute period in which 
they constantly responded to the visual stimuli 
was too long and exacting to be a comfortable 
experience. There is no evidence to indicate that 
fatigue seriously influenced the results, but the 
patience of some subjects was apparently tried. 
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RESULTS 


Analysis of the record. Figure 3 shows a 
section taken from the record of a normal 
subject. The plateau at the left traces the 
constant intensity of illumination at the 
onset of testing. 

The first drop follows the decrement of 
illumination as the glass wedge rotates in 
front of the light source. The trough im- 
mediately following the drop indicates the 
time and intensity point at which the 
subject signaled the disappearance of the 
target. The subsequent rise traces the time 
and intensity during which the subject 
reported the stimulus light as absent. The 
following peak indicates the reported reap- 
pearance of the stimulus. The subsequent 
tracings repeat this cycle. 

This record provides a series of’ ascend- 
ing and descending thresholds comparable 
to the method of limits. In this case, of 
course, the light is a steady one and no 
discrete flashes are involved. The values 
used for computation were the distances 
of peaks or troughs, in millimeters, from 
the lower edge of the record. In this way 
low thresholds were represented by low 
values and high thresholds by high values. 

The trough, in a record, was considered 
a measure of the last visible intensity for 
the subject, and the subsequent peak as a 
measure of the intensity at which the 
light reappeared for the subject. As in the 
method of limits, the mean of two such 
transition points was assumed to measure 
the momentary stimulus threshold. The 
peaks and troughs were combined in pairs, 
and the average for each pair was taken 
as a single determination. 


150 


The effect of sound. Figure 3 traces the 
visual thresholds of a normal subject for 6 
minutes. The first 214 minutes show the 
thresholds before sound is introduced, the 
next 314 minutes show the thresholds while 
the subject is under the influence of the 
constant auditory stimulus. The point 
where sound is introduced is indicated by 
an “S” on the record. 

In order to test the effects of the sound 
stimulus on the visual threshold, four 1- 
minute samples were drawn from each 
record. Such 1-minute periods provided 
equal temporal divisions, with only a few 
introductory and terminal thresholds dis- 
regarded. The first two samples were made 
up of threshold readings of the two 1- 
minute periods immediately prior to the 
introduction of sound. The third and 
fourth samples consisted of the threshold 
readings for the third and fourth minutes 
of sound. The thresholds for each time 
period were then averaged. 

The averages for each of the four time 
periods in individual records were based 
on the same number of threshold readings. 
Usually the number of thresholds in each 
time period were the same. Occasionally a 
time period had two or three more thresh- 
old readings than another. In this case 
thresholds were numbered from the begin- 
ning of a time period and superfluous 
thresholds were rejected on the basis of 
the appearance of this number in a random 
number table. 

Although the same number of thresholds 
were used in averaging time periods for à 
single individual, different records (inter- 
individual) varied in the number of thresh- 
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Fic. 3. Record of visual threshold determinations of a lesion-free subject. 
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TABLE 1 


ANALYSIS-OF-VARIANCE THRESHOLD SCORES AND MEANS FOR NORMAL AND BRAIN-DAMAGED SUBJECTS 
FOR Four CONDITIONS 


Source of Variance df MS F 

Normal Ss 

"Total 91 ad HP 

Conditions 3 12.31 7.42** 

Individuals 22 104.28 63.20*** 

Residual 66 1.65 = 
Brain-damaged Ss 

Total 91 — E 

Conditions 3 3.29 1.60* 

Individuals 22 212.35 103.58*** 

Residuals 66 2.05 = 
Conditions 1 2 3 4 

Means normal Ss 80.87 80.85 82.41 81.33 

Means brain-damaged Ss 82.18 82.10 82.92 82.54 

* p > .05. 
icity Meal UES 
*** p < 001. 


olds used in obtaining a time period 
average. This was due to individual varia- 
tion in threshold determinations in the 1- 
minute periods, that is, some subjects in- 
dicated a large number of disappearances 
and reappearances of the light during this 
time than others. 

Differences in thresholds for sound-on 
and sound-off periods were evaluated by 
separate analysis-of-variance procedures 
for the normal and brain-injured groups, 
with the time periods represented as four 
conditions (Table 1). 

The variances associated with individuals 
and conditions were obtained from a two- 
way analysis of these tables. The F values 
in Table 1 indicate the significant differ- 
ences. Individual differences are the most 
striking (p < .001) for both groups. The 
implication of such results, however, is 
hardly startling since individual differ- 
ences in visual sensitivity are obvious to 
the casual observer. The difference between 
conditions is significant for normal sub- 
jects (p < .01), but insignificant for the 
brain-injured cases (p > .05). In both 
tables the means of Conditions 3 and 4 
are higher, but the rise is less in the brain- 
injured group. These results indicate a sig- 
nificant rise in threshold for the normal 


group under the influence of sound and con- 
fidence limits placed on the means for 
conditions overlap, however, except for 
Condition 3. Thus, only the threshold rise 
for this period (1.56 millimeters) is sig- 
nificant. 

The variance and means of the presound 
thresholds, included in the tables, are 
slightly higher in the brain-injured group. 
This suggested that the two groups might 
have been drawn from different populations 
of absolute thresholds. A ¢ test of these 
thresholds, however, showed no significant 
difference (p > .50). The two groups were 
therefore assumed to be homogeneous with 
respect to absolute threshold, and differ- 
ences in the two groups with respect to 
threshold changes were used for compari- 
son. Since the analyses of variance only 
provided estimates of intragroup thresh- 
olds, t tests were used for between-group 
comparisons. Two postsound periods were 
selected for comparison with the minute 
period just prior to the introduction of 
sound. In the first case the first ys 
minute of sound was used. The average 
difference of the two periods for normal 
subjects was compared with the average 
difference of these two periods for the 
brain-injured group. The resulting t value 


12 Epwanp T. Davis 


of 1.77 lies between the .05 and .025 level 
of significance, and indieates a reliably 
greater rise for the normal group during 
this period. In the second case the first 
minute of sound was used. The results do 
not indicate a significant difference (p < 
.13). This suggests that a drop, after the 
initial rise, in the visual threshold is con- 
tributing to the average of this time period. 

In order to determine whether or not 
the thresholds of normal subjects remained 
elevated as long as the sound continued, 
the differences between the first 1 minute 
and the third minute of sound for each 
individual was obtained. A ¢ test applied 
to these values demonstrated a significant 
drop (p < .01). 

Two problems remain in the analysis of 
these data. The first one is concerned with 
the rate of rise and fall in individual 
records, Figure 4 shows a curve roughly 
fitted to averaged thresholds for the nor- 
mal group. The first 3 minutes of sound 
are represented. The threshold average for 
the 1 minute prior to the introduction of 
sound was used as the origin of the time 
axis. Nineteen points equally spaced along 
the time axis represent the points at which 
the records were sampled for threshold 
values. The question arises as to whether 
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Fic. 4. Mean visual thresholds in relation to 
duration of auditory stimulation. 


these values are the result of a relatively 
gradual change in visual threshold or are 
averages of momentary deflections. In 
order to answer this question a ranking 
scheme was resorted to. The first three 
threshold values after the introduction of 
sound were scored plus or minus to indicate 
an increase or decrease in threshold value 
from the immediately prior response, The 
number of positive values was tested 
against the number of positive values ex- 
pected by chance alone. The results of a 
t test indicate a significantly higher num- 
ber of positive values than would be ex- 
pected by chance (p < .02), and therefore 
indicate a gradual rise in the visual thresh- 
old. 

The second problem is concerned with 
the difference between the ascending and 
descending thresholds in the two groups. 
On any record, the distance between a 
peak and a trough is a measure of the 
time a subject takes before indicating the 
appearance or disappearance of the target 
light. An average of such times was ob- 
tained for each subject. A £ test was then 
used to test the significance of differences 
between normal and brain-injured subjects. 
The resulting ¢ value of 2.69 indicates a 
significant difference (p < .05). Either 
there is some perseverative influence in the 
perceptual experience of some patients, or 
their reaction times in terms of key press- 
ing is significantly longer than for lesion- 
free subjects. In either event is seems un- 
likely that the results would be seriously 
influenced. In the first instance, it would 
seem probable that the delay of response 
was the same both for the disappearing 
and reappearing target. This would leave 
unaffected the mean value based on these 
two points. In the second instance, it might 
be argued that a perseverative effect op- 
erated in only one direction, that is, while 
the light was either on or off but not both. 
In this ease the threshold means would 
either be raised or lowered. Such an argu- 
ment, however, would have difficulty ex- 
plaining the insignificant difference in the 
presound thresholds of the patient and nor- 
mal groups. 

Further comparisons. On the whole, nor- 
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mal records, when compared with those of 
brain-damaged cases, are characterized by 
a larger number of thresholds per minute 
and a greater rise with the introduction of 
sound. These factors, however, are not defi- 
nite enough to be obvious by simple in- 
spection. There is a considerable amount 
of overlap; many of the records of brain- 
injured subjects show a response frequency 
as high as normal controls, and there is an 
obvious overlap in threshold rise with the 
introduction of sound. 

Possible vitiating effects. Introspective 
reports from many of the subjects indicated 
that the introduction of the sound was in- 
terpreted as an attempt to distract the 
observer and an effort was usually made 
to disregard the sound. It is impossible to 
evaluate the effectiveness of this attempt, 
however, and it is obvious that changes in 
threshold sensitivity may be brought about 
by involuntary fluctuations of attention. 
Perceptual discrimination of a near thresh- 
old light undoubtedly demands a precise 
adjustment of receptor and cerebral mech- 
anisms, and a distraction or shift of in- 
terest to something outside the task would 
probably change this adjustment and alter 
the effectiveness of the performance. 

However, a consideration of the temporal 
course of the raised threshold in the present 
study is of interest here. Once sound is 
introduced, the raised threshold shows a 
maximum inhibition of visual sensitivity 
some 15-18 seconds later. Since 4-5 sec- 
onds were used to increase the intensity of 
the sound to a maximum this leaves a 
“peak distraction” some 10-12 seconds 
after the introduction of the sound. Such an 
effect would require a concept of distrac- 
tion with a rather long and relatively con- 
stant interindividual maximization. Usu- 
ally distraction is purely a description 
designation which is chosen to signify à 
momentary disturbance that is somehow 
related to an interruption of assumed neu- 
rophysiological processes which are re- 
sponsible for the organization of a particular 
type of behavior. The curve which de- 
scribes the raised threshold in the present 
study has already been shown to represent 
a relatively gradual climb. Such a process 


is inconsistent with the conventional con- 
cept of distraction. The ascription of this 
term to this effect might well be quite 
proper, but the value of such an ascription 
without some specificity of process to ac- 
count for the temporal span is obviously 
dubious. 

Accessory stimulation may operate to 
produce its effect in still another way. The 
pupillary reflex may respond to autonomic 
innervation brought about by the relatively 
sudden noise. In the present investigation 
such a pupillary response was controlled 
by the use of the mydriatic euphthalamine. 
The effect of this drug is to dilate the pupil 
for several hours, preventing contraction 
to afferent stimuli. Consequently it seems 
reasonable to rule out an effect from this 
source. 

Again, the performance of a given sense 
modality may frequently depend on asso- 
ciations aroused by the stimulation of an- 
other modality. In the complex perception 
of man this is an all-important factor. 
Associations developed over the years and 
the accumulated experience, which to- 
gether become the basis of automatic un- 
conscious deductions, are bound to play 
more or less decisive roles in the accessory 
action of one sense modality on another. 
It is believed that the simple physical 
properties used in the present study were 
innocuous enough to keep such determi- 
nants at a minimum. 

The greater difference in threshold rise 
with the introduction of sound in the nor- 
mal, as compared to the brain-damaged 
group, is significant at the .05 level. This 
is not a strong difference and suggests that 
the location of lesions in the experimental 
group should be related to heteromodal 
effects. The determination of the impor- 
tance of such factors, however, is limited 
by the theoretical design of the investiga- 
tion. In the present study no assumptions 
were made as to the exact area involved 
in intracortical communication, and thus 
no prediction as to the type of injury pro- 
ducing a minimum rise in the visual thresh- 
old was predicted. It was felt that the 
selection of any experimental group with 
lesions so restricted as to include only 
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areas of particular interest was virtually 
impossible. For this reason, an attempt 
was made to select cases with gross cerebral 
injury in the hope that critical locations 
would often be included. 


A Comparison or THEORIES 


Traditionally, it has been assumed that 
the relation between stimulus and sub- 
jective intensity is a relatively constant 
function. Deviations from this relationship 
were explained in terms of errors of ob- 
servation and the probability function of 
such errors. As these distributions have been 
examined more closely, such factors as 
practice, fatigue, changing metabolic proc- 
esses, and a host of affective phenomena 
have been demonstrated as contributing to 
subjective effects. 

A clarification of one of the psychophys- 
ical problems is necessary before the fac- 
tors under investigation can be isolated 
for consideration. It is helpful if a differ- 
entiation is made between stimulus or 
physical intensity, excitation or physiolog- 
ical intensity, and subjective or psycholog- 
ical intensity. Recent developments in neu- 
rophysiology have enabled us to appreciate 
more thoroughly the neurological basis for 
the psychophysical relation. Studies of 
nerve action currents have thrown some 
light on the correlation between neural ex- 
citation with physical stimuli and the func- 
tional relationship of these two factors to 
a subjective response. 

However, the measurement of the rela- 
tion between stimulus and subjective in- 
tensities is complicated. Modern psychol- 
ogy recognizes this possibility and looks 
for explanations of behavioral processes 
in physical-physiological and physiologi- 
cal-psychological functions, realizing that 
each has its own determinants. 

The studies reviewed here obviously deal 
only with physical stimuli and a behavioral 
response. In most cases the explanations 
either explicitly or implicitly recognize 
that the results obtained include a trans- 
formation of energy from physical stimuli 
to physiological processes and from these 
processes to psychological behavior. 

The theories presented in the following 


pages attempt to construct a bridge be- 
tween the physical stimulus and the be- 
havioral response. Only the aspects relat- 
ing to accessory stimulation are included. 
In this section, the attempt is made to 
consider, in a critical and constructive way, 
explanations of heteromodal effects that 
most closely adhere to and organize the 
behavioral data found in the literature and 
in the present study. 

Holistic or organismic theories. General 
psychological theories have been developed 
in an effort to understand and to predict 
the day-to-day behavior of individuals in 
a variety of roles. Personality theorists, in 
particular, built with general concepts log- 
ical frameworks supported by success in 
prediction and therapeutic progress based 
on the theory. Experimentation at this 
level has led to the construction of con- 
cepts which are found necessary as build- 
ing units. Such psychological elements are 
viewed as functioning in some type of pat- 
terned relationship. Usually, however, these 
elements and relationships have little ap- 
plicability to the present problem, but the 
theories of Werner (1940) and Lewin 
(1936) are specific enough so that they 
may be used to predict some of the general 
results of this study. Although the ex- 
planations which are offered fail to cover 
all details, they were influential in deter- 
mining the approach to the present study 
and they merit consideration. 

Werner's theory of genetic development, 
and, more specifically, his treatment of 
psychological processes in brain-injured 
subjects, account for an effect in hetero- 
modal stimulation. He has cited synesthe- 
tie data in support of this theory postulat- 
ing a greater nonspecificity in function, in 
children. He has suggested a diffusion of 
sensory processes in the child which may 
gain greater specificity and organization 
with age. He postulates that damage to 
any part of the body affects the organism 
as a whole, reducing the effectiveness of 
the integrative processes. Werner would 
thus predict an altered visual threshold 
with the introduction of sound and altered 
effects with subjects suffering from brain 
damage. Lewin’s reasoning follows the 
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same general lines (Lewin, 1936). Gold- 
stein’s experimentation (Goldstein & 
Schurer, 1941) has supported the possibil- 
ity of a reduction in integration of psy- 
chological processes when injury to the 
brain is present. Again brain injury is seen 
to isolate psychological processes and thus 
predicts the reduced effect of sound in such 
cases. 

A lack of exactness in holistie theories, 
in the area of heteromodal effects prevents 
a critical examination of the present data 
by these explanations. Intermodal influ- 
ences are predicted along with smaller 
effects in brain-injured subjects, but no de- 
tailed account of temporal or intensity re- 
lationships are proposed. 

Heteromodal theories. Facilitating ef- 
fects with accessory stimulation were once 
looked upon as an example of the “dy- 
namogenesis” of Fere (1887) and the 
“Bahnung” of Exner (1882). It was assumed 
that accessory stimulation resulted in a 
general physiological tonic effect. No at- 
tempt was made to specify the exact nature 
of this process. 

Most experimenters now agree that the 
processes producing heteromodal effects are 
probably central ones. There is some vari- 
ety in the specific cerebral functioning pos- 
tulated, but it is usually specified or im- 
plied that a better understanding of the 
processes involved lies in the mechanisms 
of neural links. 

The presence of intercentral connections 
makes such a consideration natural. The 
consensus of opinion involves the likelihood 
that an excitation originating in any of 
the receptors is not confined to a particular 
sensory modality but spreads to other areas 
of the nervous system. 

Hartmann (1933), Kravkov (1936), and 
Child and Wendt (1938) assume a summa- 
tion of afferent excitation in cases where 
the aecessory stimulus enhances the tested 
modality. Their explanations are based on 
the neurologieal concept of "facilitation." 
It is assumed that if two afferent volleys 
are small, each may result in the dis- 
charge of impulses by a certain number of 
neurons while exciting others to a degree 
short of that required to produce discharge 


of impulses; in other words subliminally. 
Under such circumstances, the field oc- 
cupied by neurons excited to discharge is 
called the “discharge zone,” while the field 
of neurons receiving subliminal excitation 
is called the “subliminal fringe.” When 
two nerves are stimulated at the same time, 
the resulting discharge may be greater than 
when fired individually. It is presumed 
that, at some place in the brain, a “sub- 
liminal fringe” of the accessory excitation 
overlaps with a “subliminal fringe” of the 
tested modality. Thus, when the two 
senses are simultaneously stimulated, there 
is an enhancement of effects. Hartmann 
(1933) has placed the area of overlapping 
“subliminal fringes” in the cortex while 
Kravkov (1936) speculated that since the 
auditory and visual pathways are proxi- 
mate in the corporaquadrigemina, this 
area is a likely point of convergence. Child 
and Wendt (1938) do not attempt to lo- 
calize this process but only point out that 
their results support the likelihood of het- 
eromodal “facilitation.” They found a 
maximal effect of an accessory light stim- 
ulus on the auditory threshold when the 
two stimuli were presented simultaneously 
or within % second of one another. These 
time intervals are shown to approximate 
those of the motor summation studies of 
Bowdich and Warren (1890), who studied 
the time relationships of the summating in- 
fluences of a variety of stimuli upon the pa- 
tella reflex. 

Kravkov (1930) explains an experiment 
on visual acuity in this way. Because of 
“irradiation” light objects against a black 
background take on larger size. Kravkov 
also suggests, “an additional excitation of 
the brain can be produced by different in- 
direct stimuli such as illumination of the 
other eye, a tone, or an odor. In the latter 
cases the excitation of the neurons directly 
involved is transmitted in the neurons of 
the optic nerve, thanks to their anatomical 
proximity; and thus a subliminal excita- 
tion is created in the visual center.” In 
this way an increase of irradiational ef- 
fect would assist discrimination by making 
larger the white interspaces between black 
objects crowded on a white background. 
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For the same reason the irradiational ef- 
fect would make more difficult the dis- 
crimination of small white objects crowded 
together against a black background. The 
black interspaces would be reduced, to the 
perceiving eye, discrimination would thus 
be hampered, and visual acuity would 
suffer accordingly. However, the theory 
cannot adequately explain Kravokv's re- 
sults which deal with the effect of sound on 
peripheral vision. Here visual sensitivity 
was shown to be reduced and Kravkov can 
only hypothesize intercentral connections 
that “dominate antagonistically over facil- 
itating influences [Kravkov, 1936].” 

Although most of the recent investigators 
turn to the central nervous system for an 
explanation, Thorne (1934) suggests that 
the results be considered in phenomenolog- 
ical terms. He made measurements of the 
visual threshold under conditions of silence 
and a simultaneous buzzer. He found both 
facilitative and inhibitory effects, with in- 
hibition much more marked. His explana- 
tion is based largely on his own intro- 
spection as he served as a subject. He 
suggests that when the auxiliary stimulus 
is relatively strong, it “becomes a figure in 
the perceptual figure ground relationship 
and raises liminal sensitivity or exerts an 
inhibitory effect; when it continuously oc- 
cupies the ground it facilitates with re- 
sulting lowering of the threshold.” 

Field theory. Gilbert (1941) has turned 
to Kéhler’s electrical field theory to ex- 
plain altered receptor processes. Kohler and 
Wallach (1944) conceive of certain cere- 
bral areas as responding to electrical stim- 
ulation in much the same way as physical 
volume conductors do. It is assumed that a 
current passing through a given area in- 
creases the resistance of this pathway, thus 
inhibiting the immediately following ex- 
citation. Subsequent currents tend to be 
shunted into adjacent tissue. This altera- 
tion in the locus of the electrical potential 
is assumed to be responsible for a distortion 
of perceptual discrimination, The resis- 
tance is termed “polarization,” and it is 
used, in the physical sense to describe the 
production of an electromotive force acting 
in the opposite direction to an original 


current. Gilbert tentatively suggests that 
lines of force originating with excitation 
from one modality, are hampered in the 
shunting process by antagonistic lines of 
force which are created by the introduc- 
tion of the accessory stimulus. This ac- 
counts for the decrement of subjective in- 
tensity in the primary modality. Gilbert 
extends this explanation to account for 
facilitative effects. He suggests that an 
accessory stimulus of moderate intensity, 
introduced and removed at the proper 
times, might facilitate the irradiating lines 
of force from the first stimulated area. 
Although no details of this process are 
given by Gilbert, he implies the use of 
superexcitability which is believed to be a 
momentary effect of the collapsing acces- 
sory field. As the accessory field collapses, 
the momentary flow of the counter electro- 
motive force developed by polarization, 
would aid the invasion of irradiating lines 
of force from the primary modality. 

Such an explanation, although it ac- 
counts for both the inhibitory and the 
facilitative effects, lacks the specificity 
which is needed to explain the decrement- 
ing auditory effects in this study and makes 
no attempt to deal with exacting temporal 
relationships. 

Studies involving brain-injured subjects. 
Controlled studies involving heteromodal 
effects in brain-injured subjects are con- 
spieuously absent in the literature. Altered 
visual effects in subjects suffering from 
severe cerebral lesions, however, have been 
demonstrated. Werner and Thuma (1942) 
have compared birth (brain)-injured chil- 
dren with children who made low scores on 
a standardized test for intelligence and 
found critical flicker fusion to be reliably 
lower on the average in the former. They 
noted that the difference was most marked 
at low intensity levels of the intermittent 
light and virtually disappeared as the inten- 
sity of the flashes was maximized. Halstead 
(1947) found a similar effect of low level 
intensity in brain-injured patients. He in- 
terpreted his findings as providing impor- 
tant clues as to the nature of the processes 
reflected by  critical-fusion frequency: 
“Here for the first time we have direct 
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evidence that they (c.f.f.) are central (cere- 
bral) processes rather than peripheral 
(retinal) as they have traditionally been 
regarded.” In addition, Halstead and his 
‘associates (1947) have found that the 
dominant brainwave rhythm in the monkey 
electroencephalogram can be driven up to 
the point of critical-fusion frequency only, 
and the visual pathway below the cortex 
can be activated by photic stimuli to the 
retinas at suprafusional values. This fact 
is in line with the evidence presented so far 
which links altered visual thresholds with 
cerebral processes. 

Klein and Krech (1952) have not only 
adduced data supporting a central explana- 
tion of altered sensory processes in 
brain-damaged subjects, but they have of- 
fered a theory that allows an interpreta- 
tion of heteromodal effects. 

Klein and Krech have demonstrated 
that_as an individual is continuously sub- 
jected to kinesthetic stimuli, a decrease 
in the subjective intensity of this stimulus 
is experienced. They have interpreted 
these results as evidence of Kühler's *po- 
larization.” Since any afferent stimuli 
would presumably follow a similar decline, 
any process demonstrating intracortical 
communication would soon show a decre- 
ment in effects. Thus, general results of 
the normal group in the present investiga- 
tion are implicitly predicted. In addition, 
however, Klein and Krech offer evidence 
that suggests that polarization is still in- 
creasing at the end of 2 minutes of stimu- 
lation. The curve, tracing the effects of 
sound, in the eurrent study is still dropping 
and approaching the presound threshold at 
the end of this time. Again, such an ex- 
planation predicts reduced transmission of 
excitation through the brain field in brain- 
damaged. subjects. Polarization, which is 
assumed to inhibit communication among 
localized cortical regions, is more pro- 
nounced, and occurs after less prolonged 
stimulation in brain lesion cases, as meas- 
ured by kinesthetic figural aftereffects. 
In this way, intracortical communication 
could be influenced earlier and more in- 
tensely in the group. 

Klein and Krech thus account for the 


decline, after the initial rise with sound, of 
the visual threshold described in this pa- 
per. They also account for the smaller rise 
in brain-injured subjects. 

They do not, however, account for the 
initial influence of one sensory modality 
upon another; and whether or not their 
theory would adequately cover the effects 
of sound on the visual threshold, at the end 
of three minutes of stimulation must await 
further experimentation. 


Bran MODEL AND NEUROLOGICAL 
PROCESSES 


The background for the postulates in 
the introductory section of this paper was 
described with little authentication for the 
neural processes presented as characteristic 
of a brain model. This section will offer 
documentation and support for the plausi- 
bility of such thinking. 

The physical aspects of signals entering 
the central nervous system from a variety 
of end organs have been directly examined. 
These include studies of visual sense cells 
by Hartline and Graham (1932), muscle- 
stretch receptors by Matthews (1933), 
sense organs for taste by Pumphery 
(1935) and Pfaffmann (1941), pressure re- 
ceptors in the carotid sinus by Bronk and 
Stella (1932) and pain fibers in the skin 
by Adrian (1928) which points out that 
two fundamental faets are evident: 

1. Though the primary stimuli to the 
sense organs are physically different, the 
signals communicated to the nervous sys- 
tem are alike in that they consist of a 
train of action potentials. 

2. An increase in the intensity of the 
stimulus is reflected in an increase in the 
frequency of transmission of impulses in- 
to the central nervous system without 
significant change in the magnitude of the 
individual potentials. 

With such evidence the first postulate 
describing a characteristic of the model in 
Figure 4, is in accord with current neuro- 
logical evidence. It states that the fre- 
quency of neural discharge in a modal net- 
work is positively related to the strength of 
the sensory stimulus. Only the generality 
of the statement requires more support. 
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Since the study described here used a 
rather definite auditory intensity, it is 
important to examine, critically, the audi- 
tory action potential at that intensity (ap- 
proximately 70 decibels). Stevens and 
Davis (1938) have shown, that, as the in- 
tensity of an auditory stimulus is in- 
creased, both the cochlear microphonics 
and the action potentials grow larger. 
Near threshold the increases can be meas- 
ured satisfactorily, but at 30 or 40 decibels 
above threshold, measurement becomes 
difficult, because the mechanical response 
of the ear to the stimulus is not critically 
damped, and the action potentials are 
superimposed on the later waves of the 
microphonics. Precise measurements are 
therefore impossible at high sound intensi- 
ties, but it can be seen that the action 
potential continues to increase as well as 
the cochlear mierophonies, although they 
do not necessarily follow the same law of 
increase. These findings add support to the 
first postulate and its applicability to the 
present study. 

Tt is also presumed (Postulate 4) that with 
a constant stimulus the frequency of neural 
discharge in its modal network does not 
remain constant but reduces to approach 
a stable lower level as the stimulus is main- 
tained. Again evidence for the complete 
generality of the assumption is lacking. 
With auditory stimulation, however, the 
relationship eannot be doubted. It has 
been demonstrated by measurement of 
the action potential of the auditory nerve 
that, immediately after the introduction of 
à continuous tone, the action potential 
does not remain constant in size but 
shrinks, first rapidly, then more slowly to 
a lower amplitude. The same drop was 
found by Galambos and Davis (1943) who 
found both a rate adaptation and an 
amplitude adaptation. They point out that 
the functional refractory period of the 
auditory nerve is not always constant. It 
increases significantly as stimulation is 
continued, and the threshold of stimula- 
tion for each fiber tends to rise also. Since 
the aetion potential is known to depend 
upon the number of individual fibers in- 
volved, it is assumed that such equilibra- 


tion is the result of a dropping out of a 
number of fibers. Stevens and Davis 
(1938) suggest that reduction processes, 
extending over several minutes, represent a 
readjustment of the chemical dynamics in 
the nerve fiber and the attainment of a new 
equilibrium between anabolism and catab- 
olism. With an intensity and frequency 
paralleling that used in the present study, 
they demonstrated that the shrinkage of 
the action potential continues for 5 to 7 
minutes after the onset of the stimulating 
tone. 

It is assumed in Postulate 5 that inter- 
ruption of a continuous stimulus would 
permit neural recovery and tend to elimi- 
nate the effects of tetanic stimulation, 

The nonspecificity of neurons is postu- 
lated with no real neurological support. Cer- 
tain effects of sensory impulses, however, 
lend themselves to speculation in this 
direction, Afferent stimulation has been 
demonstrated to lead to as many as three 
distinct responses in the cerebral cortex. 
The responses may be distinguished on the 
basis of latency, threshold, and localiza- 
tion in the cortex. In 1937, Marshall, 
Woolsey, and Bard recorded a lateney of 
8 to 10 milliseconds to afferent stimulation 
in the monkey, and the effect was sharply 
localized in the sensory cortex. In 1939, 
Forbes and Morrison (1939) obtained two 
cortical responses to stimulation of the 
cat’s sciatic with single shocks. An initial 
“primary” response whose latency was 8 
to 10 milliseconds, was followed by a 
“secondary” response 30 to 80 milliseconds 
after the stimulus. The secondary response 
was obtained equally well in all regions of 
the cortex on both the contralateral and 
ipsilateral sides. Heinbecker and Bartley 
(1940) describe cortical responses whose 
lateneies are similar after stimulation of 
the saphenous nerve in unanesthetized 
cats. In addition they describe a third re- 
sponse of still longer latency (400 milli- 
seconds) which occurs only after stimula- 
tion strong enough to activate the KEN 
fibers of the nerve. 

Secondary discharges similar to those 
under discussion have been elicited by 
various types of afferent stimulation. The 
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tactile sensory study of Marshall, Wool- 
sey, and Bard has already been men- 
tioned. Bishop (1936) has reported simi- 
lar “nonspecific” responses with optie 
excitation, and even labyrinthine stimula- 
tion has resulted in similar effects as re- 
ported by  Gerebetzoff (1940). Bremer 
(Bremer & Bonnet, 1950) reported in 
1950 the occurrence of a long latency re- 
sponse in the visual cortex to auditory 
stimulation that was introduced in the 
form of clicks. His observations have 
been more recently confirmed by Thomp- 
son and Sindberg (1960). Again Buser 
(Buser, Bruner, & Sindberg, 1963) has 
suggested a relatively specific location 
for such interaction. The relationship be- 
tween this striking type of electrical ac- 
tivity in the cortex and the function of the 
cortex is as yet unknown. Although no 
specific evidence demonstrates the involve- 
ment of the same neural chains, and 
parallel neural systems may account for 
this “nonspecific” response, the assumption 
that the same system of neurons is aroused 
by two different sensory modalities is now 
a common hypothesis. 

Postulate 3 states the conditions under 
which the hypothetical mutually recruit- 
able neuron shifts its influence from 
one modality to the other. Such a switch- 
ing mechanism has been suggested by 
Gasser (1937) in order to explain recipro- 
cal inhibition, that is, inhibition of a motor 
neuron by kinesthetie impulses from an- 
other motor neuron anatomically near it. 
In such a response it is assumed that an 
internuncial neuron takes on the proper- 
ties of a final common path. This neuron 
becomes a switch which determines which 
of two competing reflexes will have the 
right of way. Whichever way the inter- 
nuncial neuron goes, 80 goes the reflex. It 
is supposed that the final common path 
may be taken over first by one stimulus 
and then by the other, so that the two 
stimuli take turns at calling forth their 
respective reflexes. One of the factors de- 
ciding which of two competing reflexes oc- 
cur is the relative intensity of their re- 
spective stimuli. Other things being equal, 
the stronger stimulus will determine which 


of the reflexes will occur. Postulate 3 
states this relationship of competing 
stimuli, and relates the recruitment of an 
internuncial neuron to intensity of stim- 
ulation. In this case, of course, the com- 
peting stimuli are assumed to stem from 
two different sensory systems rather than 
from two different responses to the same 
stimuli. 

Postulate 6 relates neural phenomena to 
sensory experience. It can only be 
claimed at present that such physiological 
processes are correlated with stimuli po- 
tentially experiential. It is not known 
where “vision” or “audition” are con- 
summated, but neural discharge must be a 
preliminary to them. 

In evaluating the validity of the neural 
processes involved in this brain model, the 
threshold relationship to time is of critical 
interest. As already pointed out under the 
discussion of “attention”, the rise, and 
subsequent fall, of this threshold is a rela- 
tively gradual process. The maximum rise 
for the normal group is not immediate but 
occurs some 10-20 seconds after the intro- 
duction of sound. It then declines toward 
the presound value, approaching it at the 
end of 3 minutes. It is of course assumed 
that with the onset of sound, neurons sen- 
sitive to excitation from either the visual 
or auditory pathway, are responding to the 
auditory stimulus and reducing the subjec- 
tive visual experience. As already pointed 
out, the auditory pathway fatigues under 
tetanic stimulation, and a decline in the 
effectiveness of the auditory stimulus was 
for this reason, predicted. Galambos and 
Davis (1943) have demonstrated that 
auditory nerves respond to a continued 
stimulus of constant intensity by a burst 
of impulses which gradually decline in rate. 
This “rapid equilibration” occurs within a 
second or two. Derbyshire and Davis 
(1935), however, have demonstrated that 
the amplitude of the action potentials at 
the end of 2 seconds of stimulation is only 
a relatively constant value. If photographs 
are taken at appropriate intervals during 
10 minutes following the first two seconds 
of stimulation, the amplitude shows the 
further reduction illustrated in Figure 5. 
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Fia. 5. Action potential of the auditory nerve 
in relation to duration of stimulation. (After 
Derbyshire & Davis, 1935.) 


The minimum size of the response is at- 
tained in 7 to 10 minutes after the onset of 
the stimulus, Although Figure 5 traces the 
fall of potential at a frequency of 875 cycles 
per second, Derbyshire and Davis report 
that the degree and rate of equilibration is 
the same at 1,600 cycles per second. This is 
consistent with the theory that individual 
nerve fibers respond alternately when pac- 
ing ceases. 

The decrement of the auditory potential 
over this time period, closely parallels 
the rate of drop demonstrated by the visual 
threshold in the present investigation. Fig- 
ure 4 shows a smoothed curve of the 
averaged thresholds for the normal group. 
Once the threshold has maximized it can 
be seen to follow a rate of decline similar 
to that of the action potential represented 
in Figure 5. 

The correspondence in these curves may 
be fortuitous, but it does offer evidence 
demonstrating that equilibration processes 
continue for periods consonant with the 
explanations offered by the brain model of 
the present study. 

The rather slow rise in the visual thres- 
hold is not very well explained by the 
brain model. It can only be assumed that 
sometime, far longer than would be ex- 
pected from direct synaptic mechanisms, 
is needed to innervate the processes re- 
sulting in the raised threshold. 


Facilitation. The brain model offered in 
the present study allows for both inhibi- 
tive and facilitative processes with acces- 
sory stimulation. It proposes that stimuli 
of low intensity result in “facilitation,” 
and that stimuli of high intensity result 
in inhibitory effects. 

Although simple monosynaptie pathways 
exist in the central nervous system, most of 
the central reactions are mediated through 
neuron linkages of great complexity. 
Neurological theory (Fulton, 1949) as- 
sumes that when a volley of impulses 
impinges upon a pool of quiescent neurons, 
some of those neurons are excited to dis- 
charge impulses; others receive only sub- 
liminal grades of excitation; still others re- 
main quiescent. The subliminal fringe 
forms the liaison between the discharge 
zone and the quiescent members of the 
neuron pool. That is to say, if the volley 
is increased, neurons are recruited from 
the subliminal fringe into the discharge 
zone, while others from the quiescent pool 
enter the subliminal fringe; if the volley 
is decreased, neurons from the discharge 
zone are shed into the subliminal fringe 
and other neurons from the subliminal 
fringe retire into the quiescent pool. 

This process traces the phases through 
which a mutually reeruitable neuron is 
presumed to pass. With a relatively mild 
auditory stimulus, such a neuron is in- 
cluded in a “subliminal fringe,” thus facili- 
tating the visual process; but as the in- 
tensity increases this neuron is captured 
by the auditory modality; finally it is 
again lost to the auditory system as audi- 
tory equilibration sets in and the volley 
of afferent impulses decreases, 

In general the results of this study sug- 
gest basic processes of cerebral integration. 
Certain limited predictions based on early 
results of this study have been supported 
by specific experimentation. For example, 
changes in critical flicker fusion may be 
predicted. One of the variables in critical 
flicker fusion is the intensity of the flash- 
ing light. A reduction of light intensity 
lowers the fusion rate. Since it has been 
demonstrated that sound will reduce the 
subjective intensity of a light stimulus, it 
could be expected to lower the fusion 
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threshold if introduced while fusion de- 
terminations were being run. Gorrell 
(1953) has adduced data supporting this 
prediction. He has also found that sound 
is less effective in lowering the fusion rate 
of brain injured subjects. A comparison of 
adults and children has demonstrated a 
significantly different alteration in fusion 
rate under the sound stimulus. 

The studies of Griisser and Griiser- 
Cornehls (1960) lend general support to 
the assumed underlying mechanisms pro- 
posed in this thesis. They have demon- 
strated that the critical flicker frequency 
for single neurons (CFF, the maximum 
frequency at which neurons can follow 
flickering light) can be altered by con- 
comitant vestibular stimulation. It seems 
plausible that such processes underlie the 
subjective flicker fusion experiences just 
described. 

Theoretically any marked stimulation 
would be expected to lower sensory thresh- 
olds. On the basis of this reasoning severe 
anxiety and moderately loud sound may 
be viewed as producing functionally 
equivalent effects. That is, anxiety may, 
through the innervation of the sympathetic 
nervous system, result in marked thalamo- 
cortical firing which could interrupt neural 
patterns associated with sensory processes. 

A demonstration of the effects of such 
physiologically defined anxiety would sup- 
port an assumption of generality in bodily 
defense system, and it would aid in de- 
tecting the processes involved in sensory 
organization. In this sense, the present 
study brings within the scope of personal- 
ity theory phenomena which were not 
previously regarded as particularly per- 
tinent to it. 


SuMMARY 


The study of the processes underlying 
the transmission and coordination of two 
different kinds of sensory excitation in the 
cortex lies within the province of both the 
psychologist and the neurophysiologist. 
Both disciplines are examining factors con- 
tributing to the consummation of such 
integrated experiences. In the present 
study a neurological model accounting for 


specified heteromodal effects was proposed. 
The method used involved the determina- 
tion of visual thresholds in normal ‘and 
brain-injured subjects while they were 
exposed to an aural stimulus of moder- 
ately loud intensity. The results demon- 
strated group differences in the influence 
sound has on visual thresholds, and pro- 
vided information on the diminishing ef- 
fectiveness of a constant auxiliary stimu- 
lus when it is maintained for a period of 
several minutes. The results support the 
occurrence of intersensory effects and the 
probability of a centrally located mediat- 
ing process. The introduction of sound ini- 
tially raised the visual threshold of normal 
subjects and lost its effectiveness when 
maintained for a period of 3 minutes. The 
visual threshold of subjects in the brain- 
injured group were not raised as much as 
were the visual thresholds of the lesion- 
free subjects, with the introduction of a 
1,550-cycle-per-second tone at approxi- 
mately 70 decibels above threshold. 

Possible vitiating effects such as dis- 
traction, pupillary reflex, and the meaning- 
fulness of stimuli were discussed. 
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